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Preface 


The process of education involves three steps: (1) determining 
objectives, (2) providing experiences designed to achieve the objec- 
tives, and (3) measuring and evaluating the results to determine 
if the objectives have been achieved. 


Although measurement and evaluation is an important part of 
education, most teacher-training institutions in the United States 
do not require prospective teachers to take a course in the subject. 
Many of these institutions do provide instruction in the subject as 
part of some larger course which includes a unit on measurement 
and evaluation together with units on other aspects of education 
(e.g, principles, methods, curriculum, educational psychology). 
Instructors who teach such units are often reluctant to require their 
students to purchase one of the standard texts on measurement 
and evaluation, since it is difficult to justify the expense in view of 
the relatively small amount of time spent in studying the subject. 


This book has been written to meet the needs of the instructors 
and the students of courses which include a wnit on measurement. 
It contains concise chapters on all of the topics which are of most 
importance to classroom teachers. The criterion employed in decid- 

__ ing what to include was simply—Is this topic important for class- 

S room teachers? If the answer was yes, the topic was included. 
Since classroom teachers make more use of measuring instruments 
°which they devise themselves than they do of standardized tests or 
inventories, a majority of the space has been devoted to this aspect 
of measurement and evaluation. 

The typical classroom teacher has relatively little need for sta- 
tistics. This phase of measurement therefore has been minimized. 
It has not been neglected, however, since a minimum of statistical 
concepts and techniques necessary for summarizing grades and 
for interpreting scores on standardized tests has been included as 
an integral part of other chapters. 


The book is not an outline on measurement; rather, it is a short 
self-contained text. The annotated bibliography which is included 
describes the major standard-sized texts in the area so that students 
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who wish to pursue a topic more fully may easily locate further 
text materials. 


The material in the book has been used, in a preliminary mimeo- 
graphed version, by several hundred students at Los Angeles State 
College. We are indebted to our colleagues, Professors Prudence 
Bostwick, Marian Wagstaff, Morris Better, Bob Forbes, Roly 
Hahn, Sam Jones, George Kibby, Ray Pitts, and Julian Roth, an 
to their students for their many helpful suggestions. 


We are deeply grateful to Dr. Lucien B. Kinney of Stanford Unt 
versity for his careful reading of the manuscript in its preliminag 
form, and for his many helpful suggestions. We are also indebte 


to Dr. John A. Dahl of Los Angeles State College for reading the 
manuscript in its final form. 


c. W. B. 


San Gabriel, California 
April 29, 1957 


CHAPTER 


1. Introduction to Educational 
Measurement and Evaluation 


2. Evaluating Achievement with 
Teacher-devised Short-answer Tests 


3. Evaluating Achievement with 
Teacher-devised Essay Tests 


4, Evaluating Achievement Through 
Products and Performances 


5, Evaluating Typical Behavior with 
Teacher-devised Instruments 


6. Summarizing and Reporting Pupil 
Achievement and Typical Behavior 


7. Evaluating Achievement with 
Standardized Tests 


8. Evaluating Abilities with 
Standardized Tests 


9. Evaluating Interest and Adjustment with 
Standardized Instruments 


10. How to Select a Standardized 
Test or Inventory 


Contents 


PAGE 


14 


38 


50 


58 


74 


84 


94 


98 


APPENDIX A 


Annotated Bibliography on 
Measurement and Evaluation 


APPENDIX B 


More About Validity and Reliability 


Index 


Contents 


PAGE 


103 


109 
113 


CHAPTER 1 


Introduction to Educational 
Measurement and Evaluation 


Meaning of measurement and evaluation 


The word measurement means “the act or process of ascertaining 
the extent or quantity of something.” Evaluation refers to “the 
act or process of determining the value of something.” Evaluation 
depends upon, but is not synonymous with, measurement. Evalua- 

P tion goes beyond measurement in answering the question: Is the 
obtained measure desirable or undesirable? 

Courses in the subject covered by this book have, in previous 
*years, been referred to as Tests and Measurements, or simply Meas- 
urement. The emphasis was on tests and the statistical manipula- 
tion of the test results. In recent years the scope of such courses 
has been broadened to include many non-test techniques, such as 
observation, sociograms, and anecdotal records, which are used to 
supply a more complete picture of the pupil—his status and his 
progress. The word evaluation has become associated with this 
broadened scope and infers the use of non-test techniques as well 
as tests. 

When a tire gauge registers twenty-four pounds of air pressure 
in a tire, this constitutes a measurement and of itself indicates a 
situation neither desirable nor undesirable. If the recommended 
pressure is twenty-four pounds, “everything is as it should be.” On 
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the other hand, if normal inflation is thirty pounds, then “some- 
thing is wrong.” Deciding that “something is wrong” is an evalu- 
ation based on the evidence obtained from the measurement. The 
evaluation continues as possible causes for the undesirable situa- 
tion suggest themselves: (1) The gauge may be wrong; (2) there 
may be a leak in the tire; (3) someone may have let air out of 
the tire; and (4) the air pressure may be low for some other reason. 
When the reason for the low reading is discovered, appropriate 
action is undertaken. 

In education much the same kind of process as that described 
above occurs. A pupil has a reading grade placement of 2.8—that 
is, he reads as well as the average pupil in the eighth month of 
the second year of school. This fact represents evidence obtained 
through a measurement and is neither desirable nor undesirable in 
itself. If the pupil is in the fifth grade and of normal intelligence, 
the teacher knows that “something is wrong.” The teacher then 
seeks possible reasons for the discrepancy between the pupil’s actual 
reading level and the level indicated by his grade placement and 
intelligence. Appraising the evidence obtained from the measure- 
ment of the pupil's reading ability is part of the evaluation process. 


Uses of measurement and evaluation in guidance 


Our system of education recognizes the fact that all pupils are 
different and will play different roles in society. Therefore, 
although our society determines the general objectives of education, 
specific objectives are influenced by the capabilities of the individ- 
ual pupil. Determining what objectives are reasonable for pupils is 
the responsibility of educators, parents, and the pupils themselves. 
This aspect of education is referred to as guidance. A sound choice 
of objectives depends upon sound information about the pupils’ 
abilities, interests, attitudes, and character. This information is 
obtained through use of the techniques of measurement and 
evaluation, 


Guidance is concerned with the answers to such questions as: 


Should Johnny repeat the fourth grade? 


Should Mary take a college preparatory major or a business 
major? 


Should Bill elect woodshop or orchestra? 
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Should Ann be transferred from Miss Smith’s room to Mrs. 
Jones’ room? 


Uses of measurement and evaluation in instruction 


Within the individual classroom each teacher utilizes measure- 
ment and evaluation for one or more purposes. 

1. To reveal the stage at which pupils have arrived in the 
learning process. 

All teachers find it is necessary to pause now and then to survey 
the job confronting them. This survey or evaluation may occur 
during the initial stages of a semester, or unit of work, to deter- 
mine at what level the instruction should start and the necessity 
for review of previous learnings. Evaluation at this stage allows 
for an investigation of the spread of ability in the class, thus 
identifying gifted students capable of an enriched program as 
well as the pupils who need remedial work. 

Evaluation part-way through the unit of work permits the 
teacher to appraise the extent to which pupils are progressing 
toward the goals of instruction. Appraisal here allows one to 
determine whether the instructional pace can be accelerated or 
whether some reteaching is necessary. 

Evaluation at the conclusion of a unit of instruction occurs 
almost automatically in most classrooms. Here again the purpose 
is to determine the stage at which the pupils have arrived. Results 
of the evaluation indicate whether a satisfactory level of achieve- 


s ment has been reached and/or identifies areas necessitating re- 


teaching and review. Information relative to readiness for the 
„next topic can also be obtained from the appraisal at the end 
of a unit. 

In the process of discovering the pupil’s current status, it is 
necessary to reveal sufficient information to the class so that stu- 
dents may engage in appropriate self-appraisal. In this way the 
evaluation process may serve to motivate pupils to do better work. 
Motivation alone does not constitute sufficient cause for evaluation, 
and those who use tests primarily for this purpose are probably 
utilizing inadequate teaching methods as well as inadequately 
utilizing the test data. 

An ideal teaching-learning situation often develops after an 
evaluation technique has been used. Since the pupil has some ego- 
involvement in his response to a test item, a discussion of these 
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responses finds pupils eager to defend their stand, with effective 
class interaction resulting. The alert teacher uses these discussions 
to dispel pupil misconceptions, identify misunderstandings, and also 
to identify poor test items. 

2. To determine the effectiveness of instruction and planned 
activities. 

Success in the classroom depends to a large extent upon the ade- 
quacy of the teacher's plan for class activities. Unless there exists 
within the plan provision for evaluation, the effectiveness of these 
planned activities remains a mystery. Therefore, teachers are found 
reusing techniques and methods which, unknown to them, are 
extremely ineffective. At the same time, particularly strong tech- 
niques may be discarded for lack of validation. 

Professional people are personally responsible for their own pro- 
fessional growth—that is, improving and validating their methods. 
Evaluation serves a real purpose in providing them with techniques 
for doing so. As an example, a junior high school teacher recently 
became interested in group dynamics as a teaching technique. He 
decided to use this approach in a unit. He identified specific objec- 
tives, organized materials, and planned his procedures. At the close 
of the unit, through the employment of recognized evaluation tech- 
niques, he discovered the group dynamics approach particularly suc- 
cessful for his purposes. The results encouraged him to use the 
technique further, but to adjust it slightly to better fit the local 
school population. In the process he also identified areas of instruc- 
tion which needed further study, as well as individuals in the class 
who would benefit from remedial work. 

3. To serve as a basis for summarizing and reporting pupil 
progress. 

Almost all schools require teachers periodically to summarize and 
report pupils’ progress. These summaries are recorded in perma- 
nent records of the school and are reported to parents either in the 
form of report cards, written reports, or other ways. Often the sum- 
maries are stated in grades or marks. Parents utilize these summary 
reports as an aid in understanding and guiding their children. 
School personnel utilize the summary reports as an aid to guidance 
and as a source of information when questions regarding pupils 
arise in connection with enrollment in advanced or remedial classes, 
job placement, and entrance to college. Since crucial decisions are 
-frequently based on summary reports, the reports should be fair 
and accurate. Through the use of measurement and evaluation 
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techniques, teachers can obtain sufficient information to achieve 
this end. 
4. To throw light on the feasibility and practicability of stated 


objectives. 
An appraisal of evaluative data sometimes results in review of 


aims rather than of procedures. Consider the zealous elementary 
arithmetic teacher who attempted to teach the multiplication proc- 
ess to the extent that all students achieved 100-percent accuracy. 
Through careful research, she acquired information on the best 
teaching techniques and the most effective teaching materials. After 
adapting both the techniques and materials to fit her own situation, 
she proceeded to teach multiplication. After a reasonable period of 
time, a testing program disclosed that the pupils had not achieved 
100-percent accuracy in multiplication. She then provided remedial 
sessions, modified her planned experiences, and devised new activ- 
ities. At the conclusion of these experiences, although much prog- 
ress had been made, she was still short of the goal. In evaluating 
the results of her efforts, she might well conclude that since she had 
confidence in her own ability and since the techniques and materials 
were carefully selected, possibly the chosen goal was unrealistic and 
impractical in view of the time and effort required to achieve it. 


Relationship between objectives, activities, and evaluation 


The relationship between evaluation and the entire instructional 
process is revealed by examining the steps in the instructional proc- 
ess. These are: 

1. Establishing objectives to be achieved. 

2. Providing experiences and activities expected to contribute to 

the achievement of the objectives. 

3. Evaluating to be sure that the desired results have been 

achieved. 

These three steps can be utilized to systematize and organize a 
unit in a course or even an entire course. A plan sheet set up with 
three columns, each headed by one of the three instructional steps 
can provide the framework for course development that will guaran- 
tee consideration of the key processes. An example of a partial 
framework for a seventh-grade arithmetic course appears on page 6. 


Table 1 Partial Framework for 7th Grade Arithmetic Course 


Objectives 


The pupil: 

1. Exhibits an appreciation for mathematics 
by: 
1.1 Working on mathematical recreations 
during his leisure time. 

1.2 Asking questions relative to uses of 
mathematics in our society. 


2. Exhibits insight into and understanding 
of the mathematical processes. 
2.1 Can explain the rationale behind the 
processes. 
2.2 Can explain the relationship between 
the processes—such as division is a short 
cut for subtraction. 


3. Exhibits facility in the processes of arith- 
metic. A 
3.1 Can compute with reasonable skill 
and accuracy using whole numbers, frac- 
tions, and decimals. 
3.2 Can work problems involving per 
cents. 


4. Can apply arithmetic skills to life situ- 
ations. 
4.1 Solves problems that arise in his own 
experiences. 
4.2 Can read and construct graphs and 
tables with understanding. 


~ 


Activities 
The teacher: 
1. Will introduce puzzles and recreational 
materials to the class and encourage them 
to work on them. 


Organize field trips to industries which use 
mathematics extensively. 


2. Will develop mathematical ideas through 
the use of concrete experiences. 

Will build new ideas upon concepts al- 
ready understood, and will provide many 
opportunities for pupils to explain the “why” 
of the processes. 


3. Will provide many opportunities for 
pupils to work exercises, play mathematical 
games, and drill on weaknesses in compu- 
tation, 


4. Will assign problems which are of interest 
to children at this age. 

Will provide problems which grow out of 
activities in other classes such as social 
studies, shop, and physical education. 


v = 


Evaluation 


1. Observation using check 
list and anecdotal records. 


2. Observation—the teacher 
will listen as pupils “think 
aloud” in working problems, 


3. Written tests—standard- 
ized, teacher-made, diagnos- 
tic. 


4. Written tests with word 
problems. Observation. In- 
terviews. 
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Note that the second and third columns above can be determined 
logically when the objectives in the first column are stated behavior- 
ally, and further that the specific evaluation techniques relate 
directly to the particular pupil behavior desired. 


The key steps in the evaluation process 


In the process of evaluation, the teacher or evaluator raises four 
questions which determine the steps of the process. 

1. What would he see if the objectives were realized? This step 

involves stating specific objectives and emphasizes that when 

objectives are realized, some observable evidence must be avail- 
able. 

2. Where and in what situations would he see the evidence? The 

place and time that the evaluator would note this observable evi- 

dence must be identified. : 

3. How can he get a record of the evidence? The process for col- 

lecting and organizing the evidence must also be identified. 

4. How would he appraise the evidence, or what is its signifi- 

cance? This final step asks what the evidence means and implies 

that some action is taken in light of it. J 

The manner in which these four questions are answered is illus- 
trated by the following examples. 

A young man wishes to buy a used car and locates one which out- 
wardly seems to fit his needs. Before buying he investigates many 
aspects of the car, or to put it another way, he evaluates it. 

First, he asks, what does he want in a car, or what would he see 

°if this were the car he wanted? The answers to these questions 
make up his objectives. He might want such things as: 

l. Efficient performance. 

2. Attractive appearance. 

3. Accessories. 

4, Appropriate sales price. 

Secondly, he asks, where would he see evidence related to these 
objectives? The careful buyer would not accept the advertisement 
or sales talk at face value. He would turn directly to the automo- 
bile to collect most of this evidence. Some of the characteristics, 
such as the paint, he could observe while the car was in the car lot. 
In other cases he would plan a situation to obtain evidence. For 
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example, to determine performance, he would drive the car and in 
the process might choose to drive up a steep hill or over a rough 
street, or perhaps he would drive it for some distance in second 
gear. Note that to collect some evidence he contrives a situation 
which is not the usual. 

For the third step he needs to record the evidence he has decided 
to collect and asks himself how can he economically and efficiently 
collect these data. Some of his questions could be answered by 
direct observation, and he could record these responses on a check 
list or by making a note of his findings. Evidence relating to gas 
mileage or oil consumption can be obtained by measuring the gaso- 
line and oil before and after a trip. We might note further that if 
he is skilled and has the correct equipment, he might perform one 
or two tests and obtain some of the information much more efli- 
ciently. For example, a compression check might indicate in a few 
minutes more information about the car’s gas and oil consumption 
than the buyer could discover in a 100-mile trip. This is analogous 
to many educational tests which provide a short-cut method of 
collecting data on pupils. 

To be useful, the collected information must not only be recorded 
but organized. The car buyer looks at several cars and wishes to 
choose among them; he will need records that permit a valid com- 
parison. A check list might be one method for the car buyer to 
record and organize his data. 

Finally, when he has collected all the information he can in the 
time allowed, he must appraise the data and make a decision. He 
either buys the car, bargains for an adjustment of price, or rejects 
it entirely. Note here that he attaches a value to the evidence he 
obtained and takes appropriate action. 

To relate these steps specifically to educational evaluation con- 
sider the following illustration. 

A teacher assumes the goal for her e., “the development of 
good work habits.” This very general objective might be included 
in the list of aims for almost any class or course. She then asks the 
question: What would she see if her pupils displayed good work 
habits? After some consideration, she listed the following: 

1. Promptness in reporting to class 


Bringing books and other school supplies to class. 
Completing assignments on time. 

Organizing the job to be done. 

Efficient budgeting of time. 
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What she actually did was to express the objective, “the develop- 
ment of good work habits,” in terms of observable pupil behavior. 
Expressing objectives in this way is often referred to as the opera- 
tional definition of aims. 

As a second step, situations are planned where this pupil behavior 
may be observed. Such things as reporting to class promptly and 
bringing books can be observed during the normal course of school 
work. To determine whether pupils complete assignments on time 
the teacher may assign a variety of jobs to be completed, including 
committee reports, individual projects, and the like. Actually, all 
teacher assignments serve in part as contrived situations from which 
evidence can be obtained for evaluation purposes. 

As a natural next step, the teacher will determine how she can 
obtain a record of the evidence. ` 

Promptness in reporting to class can be observed and recorded. 
Also, a periodic check will reveal whether pupils have brought their 
books and other school supplies. Some record should be made of the 
occasions when a pupil has not provided these items. Failure to 
hand in school assignments on time should also be systematically 
recorded each time it occurs. 

Finally, when the teacher has recorded the evidence on work 
habits, she is ready to appraise the evidence and act accordingly. 
The action taken in this case might take several forms. Probably 
the data for each student will need to be summarized, a mark 
assigned, and a report made on a form or card. Perhaps a reorgan- 
ization of the instructional approach to supply additional training 
in the development of work habits is necessary. It may be that 
insufficient data were collected, implying a need to revise the evalua- 
tion techniques. In any event, the point is that some appropriate 
action must be taken in*light of the evidence obtained; otherwise 
the work and planning involved in collecting the evidence has no 


purpose. 


Essential characteristics of measurement procedures 
used in evaluation 


Although measurement is never an end in itself, sound measure- 
ment is a prerequisite to sound evaluation. Correct decisions cannot 
be made based on faulty evidence. Regardless which particular 
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measurement technique is employed, there are three questions 
regarding the technique which should be answered affirmatively: 

1. Does the technique obtain valid evidence? 

2. Does the technique obtain reliable evidence? 

3. Is the technique practical and economical? 


The degree to which a measurement technique obtains the kind 
of evidence which its user intends it to collect is the measure of its 
validity. A test composed of computation items in sixth-grade arith- 
metic is a valid test of achievement in sixth-grade arithmetic com- 
putation. However, it is not a valid test of the pupils’ ability to 
apply the computation to the solution of word-problems. A valid 
test of this ability would include items requiring the solution of 
word-problems.’ 

The degree to which a measurement technique obtains accurate 
and consistent evidence is the measure of its reliability. For exam- 
ple, if a desk is measured with a yardstick by two different persons, 
it is reasonable to expect both persons to arrive at very similar 
answers. However, if a child’s intelligence is determined by two 
different psychologists on succeeding days, there may be a 10- or 
15-point difference between the 1.Q.’s determined by the two psy- 
chologists. Since the intelligence of the child does not change over 
such a short period, the difference in the 1.Q.’s must be attributed to 
lack of precision in the measuring instrument. The terms precision 
and consistency are actually two words which express the same idea. 
Thus, if a yardstick is used to measure height to the nearest half 
inch, then it is reasonable to expect that the measures would be 
precise and the heights of the students obtained by one teacher 
would be consistent with the heights obtained by a second teacher.’ 

The third essential characteristic of a measuring instrument 
relates directly to its usefulness, Obviously an evaluation technique 
must be sufficiently economical, costwise, so that the school can 
afford it, and also timewise, so that the teacher, with all her respon- 
sibilities, can carry it through. It is for these reasons that paper- 
and-pencil tests have achieved their popularity. Tests can be admin- 
istered to groups, conveniently scored, and interpreted according to 
a standard. However, as stated previously, many objectives cannot 
be evaluated by means of a written test, and thus many other tech- 
niques for evaluation have been developed. 


1 For a further explanation of validity see Appendix B, page 109. 
2 For a further explanation of reliability see Appendix B, page 110. 


Evaluation is comprehensive and continuous 11 


Evaluation is comprehensive and continuous 


Evaluation is comprehensive because evidence is obtained regard- 
ing pupils’ abilities, interests, health, adjustment, achievement, 
character—in fact, every aspect of the total personality. This evi- 
dence is used to guide pupils and to judge pupils’ progress. It is 
also used to evaluate the quality of the educational program 
offered to the pupils. 

Evaluation is continuous because every action of the pupil is 
a part of the evidence which the teacher gathers in order to better 
understand the pupil. Evaluation is not limited to the weekly test 
or the final examination. Every question the pupil asks, every 
assignment the pupil completes, in short, everything which he 
does, in and out of classroom, contributes to the total evi- 
dence which the teacher gathers. To collect the many different 
kinds of evidence requires the use of a variety of measurement 
techniques. The most common and useful of these techniques are 
discussed in the following chapters of this book. 


EXERCISES 


1. What are some measurable pupil behaviors which you consider 
to reflect “good citizenship” at the third-grade level? At the 
twelfth-grade level? 

2. What are the dangers in attempting to evaluate “good citizen- 
ship” without translating it into pupil behavior? 

3. State some general objective in your teaching area and translate 
the general objective into specific measurable pupil behavior. 

4. What reasons might account for the fact that a child in the fifth 
grade tests at the second-grade level in reading? How would you 
determine which of these possible reasons was actually correct? 

5. Select three or four objectives from a course of your choice and 
fill out the following plan sheet. In the objective column, state 
the objectives in terms of specific measurable pupil behaviors. 
In the activities column, list the activities which you would pro- 
vide to accomplish the objectives. In the evaluation column, list 
the various methods by which you could determine whether 
the objectives had been achieved. 
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Objectives Activities Evaluation 
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SUGGESTED ADDITIONAL READINGS 
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CHAPTER 2 


Evaluating Achievement with 
Teacher-devised Short-answer Tests 


The short-answer type of test consists of a collection of items 
which can be answered by selecting the correct response from a 
number of possible answers supplied by the test-maker or by 
supplying the correct answer to a question in a few words or 
symbols. Items of this type are sometimes called objective items 
because, by reference to an answer key, the pupil’s answers may be 
scored objectively as right or wrong. 

Short-answer tests are only one of many measurement tools uti- 
lized by the teacher in the evaluation process. In the course of a 


semester's work, the teacher may utilize short-answer tests, essay 


tests, discussions, observations, term papers, oral reports, and other 
means of evaluating her pupils’ progress as well as her own teaching 
effectiveness. Before constructing a short-answer test, therefore, it is 
necessary to decide whether this type of test is the proper measuring 
device to use. Sometimes, after consideration, it will be apparent 


that a short-answer test is not appropriate for evaluating the par- 
ticular objectives in question. 


Planning the short-answer test 


After deciding that a short-answer test is appropriate for meas- 
uring the objectives under consideration, the next step is to plan 
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the test. At this stage the objectives to be measured by the test must 
be considered. Although tests are sometimes written to evaluate a 
single objective of instruction, they usually cover several different 
ones. When this occurs, it is necessary to decide what portion of the 
test shall be assigned to each of the separate objectives. Unless this 
is done, the portions of the test devoted to the various objectives 
may be completely out of line with the relative emphasis which 
should be placed on each of the objectives. Consider a social 
studies unit on South America which has as one of its objectives the 
knowledge of the principal products of the various countries. 
Although a knowledge of these products might constitute only 10 
percent of the stated objectives of the unit, a test on the unit might 
include an excessive number of items measuring this objective be- 
cause of the relative ease with which such items can be constructed. 

Both the relative weight of the various objectives of the unit or 
material covered by the test and the relative weight of the various 
content areas covered by the unit must be considered in writing the 
test, If in the social studies unit cited 10 percent of the time and 
effort of the class has been devoted to studying Argentina, it would 
seem reasonable to expect approximately 10 percent of the items on 
the test to deal with Argentina. A test which either neglected 
Argentina entirely or included 30 percent of the items on Argentina 
would not be reflecting the time assigned to this aspect of the unit. 

Professional test-makers often develop a “blueprint” for the test 
which specifies the exact percentage of items according to content 
and objective. The teacher usually will not go into as much detail 
as the professional test constructor. There is, however, a clear neces- 
sity to plan the emphasis in the items according to the objectives to 
be measured and the content to be included. In no event should 
the test-maker just start writing items. Since items are easier to 
write in some areas than in others, a test constructed in this way can 
only by the very sheerest coincidence correspond to the test which 
would have been devised by a prior consideration of the objectives 
and content. It is not necessary to spend the time and effort on an 
elaborate plan or “blueprint” for each test. A simple test specifica- 
tion, such as the one on page 16, will result in a test far better than 
one written with no specifications. 

The specifications for the fifty-minute arithmetic test can also be 
represented by the “blueprint” on page 16, which combines the 
information regarding objectives and content. 
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Table 2 Specifications for a Fifty-minute Arithmetic Test 
Allocation of items Items measuring ability to 
by objective compute. 40% 
Items measuring ability to 
use computation in 
problem solving (word 


problems) 60% 

Allocation of items Fractions 25% 
by content Decimal fractions 25% 
Percents 50% 


It is obvious that more detailed specifications could be written. 
For example, the 25 percent allotted to fractions could be subdi- 
vided into addition of fractions, multiplication of fractions, and so 
forth. Such refinement in the specifications would be more desir- 
able and would result in an improved test. However, any specifica- 
tions are better than no specifications at all, and all test-makers 
should make some attempt at “blueprinting” their tests before writ- 
ing test items. 


Table 3 Blueprint for a Fifty-minute Arithmetic Test 


EE ee eee eer ae 


Objectives 
Content Computation Solving Word-problems 
(40%) (60%) 
————— 
Fractions 
(25%) 10% 15%, 
Decimal fractions 
(25%) 10%, 15% 
Percents 
60%) 20% 80% 


In addition to the objectives and content of the test, the test- 
maker must also think of the length and the desired difficulty, The 
length of the test will depend on the amount of material to be cov- 
ered and the extent to which other measures will be available. If a 
test is given every few days, each test may be rather short. On the 
other hand, if an entire semester's grade is to be based on only two 
or three tests given during the semester, each test will, of necessity, 
be long and comprehensive. 


o 
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The difficulty of the items should be determined by the purpose 
of the test. If the test is being given to determine whether the pupils 
have mastered minimum essential learnings, it will not matter if 
the items are easy and all of the pupils receive high scores. How- 
ever, if the test is being given to determine which pupils know most 
about the subject, and which know the least, it will be necessary to 
include some difficult items; otherwise, there will be no way to dis- 
tinguish between the good pupil and the poor one. 

After the objectives, content, difficulty, and length of the pro- 
posed test have been determined, the items are written. At this 
point the test-maker has the choice of a wide variety of item forms. 
Only by being familiar with the various types of items and knowing 
their advantages and limitations can the item-writer decide which 
type or types of items to employ in any given test. 


Selection-type items and supply-type items: Two basic types 


Although there are many different types and variations of items, 
they can be divided into two major types—selection-type items and 
supply-type items. Selection-type items require the pupil to select 
a response from several alternatives supplied by the test-maker. 
Supply-type items require the pupil to provide a word, phrase, or 
number for the answer. Multiple-choice, matching, and true-false 
illustrate the selection-type items. Direct questions and comple- 
tion items constitute the supply-type item. The essay item is actu- 
ally a form of supply-type item although it is usually considered 
a different type and will be discussed separately in Chapter 2. 


Multiple-choice items 


The multiple-choice item consists of either a question or an 
incomplete statement followed by two or more possible answers 
to the question or completions of the statement. | These possible 
answers are referred to as responses. The question form of the 
multiple choice item is illustrated by the following example: 

Who was President of the United States of America in 1955? 

1. John Dulles. 

2. Dwight D. Eisenhower. 
3. Richard Nixon. 

4. Charles Wilson. 
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This same item in the incomplete statement form would read: 

The President of the United States of America in 1955 was 

1. John Dulles. 

2. Dwight D. Eisenhower. 

8. Richard Nixon. 

4. Charles Wilson. 
For this particular item, there is no advantage to either the ques- 
tion form or the incomplete statement form of item. 

A fundamental requirement of a multiple-choice item is that 
the stem, the name given the question or incomplete statement, 
pose a distinct problem. Since the question form of item forces 
the writer to formulate a complete thought, this form of the mul- 
tiple-choice item is more appropriate for use by inexperienced 
item writers. Using the incomplete statement form of item can 
result in an item which is nothing more than a series of true-false 
statements as the following item illustrates. 

The President of the United States of America in 1955 was 

1. a Republican. 
2. formerly a Supreme Court Justice. 
3. formerly an officer in the Navy. 
4. a bachelor. 
This kind of item should be avoided. Unless there is clearly 


one central problem, the multiple-choice form of question is not 
appropriate. 


The examples of multiple-choice items given thus far have had - 


four choices, have had one correct answer, and have called for a 
knowledge of factual material to determine the correct response. 
None of these conditions is necessary to a multiple-choice item. 
A multiple-choice item may have as few as two choices or as many 
as the item-writer can devise. The reason for usually having four 
or five choices is that it reduces the chances of a pupil’s guessing the 
right answer. However, each incorrect response, called a distractor, 
should be plausible to a person who does not know the correct 
answer. When multiple-choice items are written for use with young 
children, there should only be two choices, the correct answer and 
one incorrect answer. 

The most common form of the multiple-choice item is that 
which calls for one right or best answer. A variation sometimes 
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used is to vary the number of correct answers from item to item, 
and require the student to mark all correct answers. When this 
variation is used, it is possible for no answer to be right or for one 
or more of the answers to be right. The following item illustrates 
this variation of the multiple-choice item: 

Which of the following geometric figures contains four right 
angles? 

1. Square. 

2. Circle. 

3. Rectangle. 

4. Equilateral triangle. 
Items of this type can be marked by giving one point for each cor- 
rect answer and one point for each incorrect answer which is not 
marked. In the preceding example, the student who marked 
responses 1 and 3 would receive the maximum score of four points. 
The student who marked responses 2 and 4 would receive no points. 
This type of item can also be scored on an all-or-none basis; that is, 
if responses 1 and 3 were marked the answer would be right; any 
other combination of marks would result in no credit. 

The multiple-choice item is the most versatile form of short- 
answer item. It can be used to measure skill, knowledge, under- 
standing, and application. Short-answer tests, including multiple- 
choice tests, have been criticized as measuring only factual out- 
comes. If the item-writer has a clear picture of what understanding 
or application is to be tested, and is willing to take the time and 
effort to develop good items, this objection can very easily be 
Overcome. 

a One method of measuring understanding is to provide the pupil 
With written or pictorial materials which pose a new and realistic 
Problem situation, and then present him with objective (usually 
multiple-choice) items which test his ability to apply school-learned 
skills in solving these new problems. Although this type of item 
takes time to write, it can be used effectively to measure under- 
Standing. It has been used extensively in the Sequential Tests of 
Educational Progress. On the following pages are illustrations of 
items' from this series of tests in the areas of reading, writing, math- 
€matics, science, and social studies. For further illustrations of this 


ih Quoted from A Prospectus for the Sequential Tests of Educational Progress 
With the permission of the Cooperative Test Division, Educational Testing 
ervice, Princeton, N. J» and Los Angeles, Calif. 
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type of item see The Measurement of Understanding edited by 
N. B. Henry.” 


.. Samples of Reading Comprehension Test Material 
(Grades 4-6) 
Dear Bill, 

It was fun to be on the farm. Yesterday morning, Jack and I 
watched Aunt Mary make butter. She did not need to use all her 
cream to make butter. She sent most of the cream to the creamery. 

I wish I were a farmer. I would take just a little cream for but- 
ter. Then I would use all the rest of the cream to make ice cream. 
Wouldn’t that be fun? 

I'm sorry you could not go to Jack’s farm with me. I had the 
time of my life. Every day, Jack kept finding some new thing to do. 

I came back to town yesterday. I must say good-bye for now. 
` ‘Write soon. 

; Your cousin, 
Betty 


41. In this letter, Betty is trying to tell 
A how to make butter. z 
B what she did at the farm. 
C what horses eat. 
D how much noise a hog makes. 


42. In the first part, Betty tells about 
E how the creamery makes butter. 
F Betty and Jack making butter. 
G where cream comes from. 
H Aunt Mary making butter. 
43. Which of these things that Betty said tells best how she feels 
about living on a farm? 
A We worked around the barn. ~ 
B I came back to town yesterday. 
C I wish I were a farmer. 
D We rode Jack’s horse. 


*Forty-fifth Yearbook, Part I, National Society for the Study of Education. 
Chicago: The University of Chicago Press, 1946. 
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44. The letter is happy except where Betty is 
E saying Bill couldn’t come. 
F telling about riding the horse. 
G having to say good-by. 
H telling about the cream. 


45. Where does Betty live? 


Š 


A In the mountains. ry Sa 

B On a farm. * Ëb, 
C Near the ocean. Se Calcutta 
D In a town. SAB, Y 


Samples of Writing Test Material 
(Grades 10-12) , 
My Favorite Magazine 


taking a great interest in the 


1 Many young people of today are 
Tee x large size with a considerable 


magazine Suburbia. 2 It is of a fairly 
number of pages. 3 The publishers, Allen and Watts, are well 
known and reputable; thus providing young homemakers with ideas 
and practical plans for their present and future homes. 4 Because 
it also contains articles of family and community relations and hints 
on home improvement, it is most likely preferable reading to people 
who are seeking guidance on such matters. 5 The previously men- 
tioned content explains why the advertisements would logically be 
° about products and furnishings for the home. 6 The articles were 
well written, and all the features of Suburbia help to form an inter- 
esting and informative magazine. 7 I found one particularly inter- 
esting article, it was entitled “Families Are Using Spare Time to 
Broaden Their Horizons.” 8 This article points out that enr 
time should be spent in any activity other than usual work, instea 
of remaining in complete inactivity. 
8. In Sentence 2, how could the size of the magazine be indi- 
cated most effectively? 
E By comparing it with one or two W 
F By giving length, width, thickness, 
pages. 
G By drawing a scale model. 
H By telling how many articles eac! 


ell-known magazines. 
weight, and number of 


h issue contained. 


y Te W 
ee a E 
fee f 


“SESE ON y 


“Gy eit 


Gee 
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10. 


1l. 


= 
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As Sentence 3 is now written, which way of punctuating it is 
most acceptable? 

A reputable; thus (As it is now) 

B reputable: thus 

C reputable. Thus 

D reputable, thus 


Which of these revisions of Sentence 3 is best? 

E Since the publishers, Allen and Watts, are well known and 
reputable, they provide young homemakers with ideas and 
practical plans for their present and future homes. 

F The publishers, Allen and Watts, provide young home- 
makers with ideas and practical plans for their present and 
future homes; hence they are well known and reputable. 

G Allen and Watts are well known, providing young home- 
makers with ideas and practical plans for their present and 
future homes, as reputable publishers. 

H The publishers, Allen and Watts, who are well known and 
reputable, provide young homemakers with ideas and prac- 
tical plans for their present and future homes. 


Sentence 5 is awkward. Which of the following revisions is 
best? 


A Advertisements are about products and furnishings for the 
home because this is logical in a magazine of such content. 

B The advertisements would logically be about products and 
furnishings for the home, like the articles. 

C In keeping with the content of the articles, the advertise- 
ments are about products and furnishings for the home. 

D Because the content of the articles, as previously explained, 


are about homemaking, so the advertisements properly are 
also. 


Since this report is just one paragraph, with which sentence 
should it stop? 


E Sentence 5. 
F Sentence 6. 
G Sentence 7. 
H Sentence 8. 


: 


i 
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Samples of Mathematics Test Material 
(Grades 4-6) 


Situation: In Tom’s school, some children ride bicycles, some walk 
to school, and some ride the school bus. The pupils on the safety 
patrol have to come early. 


l. Two children from each class in the school were members of 
the safety patrol. To find how many patrol members there are 
altogether, what other factor would it be necessary for you 
to know? 


A The number of children in the school. 
B The number of classes in the school. 
C The number of children in each class. 
D The number of street crossings. 


(Grades 7-9) 


Situation: Mrs. Cain went to the power and light company to check 
on her electric bills and obtain information about electrical equip- 
ment. 

l. Mrs. Cain wanted to buy an electric blanket. The office man- 
ager of the electric company told her that the blanket would 
cost 3 cents a night to use. If she used it 200 nights out of the 
365 nights of the year, the yearly cost would be 
A $ 4.95. 

B $ 6.00. 
C $10.95. 
D $60.00. 

2. Mrs. Cain’s house has 4 electrical circuits. The power com- 
pany recommends having one circuit for each 500 square feet 
of floor space. On this basis, how many additional circuits 
should Mrs. Cain have installed? 

El 


H cannot be determined from the information given. 
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(Grades 10-12) 
Situation: Mr. Jones has a dairy farm on which he also grows corn. 


1. Mr. Jones has two fields of equal size on which he grows corn. 
If 7/8 of field I and 8/9 of field II are devoted to corn, which 
one of the following statements is true? 

A Field I has more space devoted to corn. 

B Field II has more space devoted to corn. 

C Equal space is devoted to corn in both fields. 

D The amounts of space devoted to corn cannot be compared. 


Mr. Jones said that 3/4 of his cows were Jerseys, but only 2/3 
of his neighbor’s cows are Jerseys. If the neighbor's herd is 
larger than Mr. Jones’, which one of the following statements 
is true? 


A They have the same number of Jerseys. 

B Jones has more Jerseys. 

C The neighbor has more. 

D Itcannot be determined who has more Jerseys. 


ro 


(Grades 13-14) 


Situation: The Mill City Statistical Agency conducts opinion polls 
and surveys and performs related statistical research. 


1. A new interviewer for the agency reported at the end of his 
first day that he had interviewed 100 people. He said that 42 ° 
of these were men, of whom 30 were Democrats, and that 49 
were Republican women. The agency needed to know hows 
many Democratic women he had interviewed but all he could 
remember was that everyone had been either Democratic or 


Republican. From this information alone, it is possible to 
determine that 


A the data are contradictory. 

B there were 9 Democratic women. 

C there were 19 Democratic women. 

D there is still insufficient data for an answer. 


In the group interviewed in the preceding question, 30% of 
the persons were Democratic men. Another interviewer 
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reported 20% Democratic men in a second sample. If their 
reports are combined, then the percent of Democratic men 
| A is 50%. 
B is 25%. 
C cannot be computed without knowledge of the size of the 
second sample. 
D is unknown because the figures are contradictory. 


Samples of Science Test Material 
(Grades 4-6) 


Situation: Tom wanted to learn which of three types of soil—clay, 
sand, or loam—would be best for growing lima beans. He found 
three flowerpots, put a different type of soil in each pot, and 
planted lima beans in each. He placed them side by side on the 
window sill and gave each pot the same amount of water. 


E 


LOAM CLAY SAND 

The lima beans grew best in the loam. Why did Mr. Jackson 
say Tom’s experiment was NOT a good experiment and did NOT 
sprove that loam was the best soil for plant growth? 
A The plants in one pot got more sunlight than the plants in the 

other pots. 
B The amount of soil in each pot was not the same. 
C One pot should have been placed in the dark. 
D Tom should have used three kinds of seeds. 


o 


(Grades 7-9) 


ome a farmer and his father encour- 


Situation: Tom planned to bec 
the garden to use for 


aged this interest by giving Tom a part of 
Studying plant life. 

om wanted to find out ¥ € 
Plants. He put some good soil in two different boxes. 


vhat effect fertilizer has on garden 
To box A he 
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added fertilizer containing a large amount of nitrogen. To box B 
he added fertilizer containing a large amount of phosphorus. In 
each box he planted 12 bean seeds. He watered each box with the 


same amount of water. One thing missing from Tom’s experiment 
was a box of soil with 


A both fertilizers added. 


B neither nitrogen nor phosphorus fertilizers added. 
C several kinds of seeds planted. 
D no seeds planted. 


(Grades 10-12) 


Situation: You and your family are visiting the Grand Canyon 
National Park in Arizona. The canyon, one of the geological won- 
ders of the world, is a gigantic gorge as much as 18 miles across 
and a mile deep. At the bottom of this gorge the Colorado River 
is now flowing through an inner gorge of extremely ancient meta- 
morphic rocks, which are covered by thousands of feet of varied 
sedimentary formations. 

Upon reaching the bottom of the canyon, you find the Colorado 
River extremely turbulent and muddy. To determine how much 
mud and other eroded material is in the water, it would be best 
to take measured samples of the river water and 


A determine the average molecular weight of the samples. 

B evaporate the water and weigh the residue. 

C add reagents to 
the residue. 


D filter, evaporate the water, 
filtrate. 


precipitate dissolved minerals, filter, and weigh 


and weigh the residue from the 


(Grades 13-14) 

Situation: The Alpha Uranium Co: 
for, obtain, and refine uranium ores. 
As a protection against radiation injuries, a check is needed to 
determine whether plant employees have been exposed to too much 
radiation. Which of the following safety procedures would be best? 


mpany is organized to prospect 


| 
| 


| 
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A Providing each worker with a Geiger counter to be carried at 
all times. 

B Providing each worker with a radiation-sensitive piece of pho- 
tographic film mounted in a badge. 

C Having each employee pass by a Geiger counter as he leaves 
the plant. 

D Taking a weekly X-ray of each employee and checking for 
radiation bone-damage. 


Samples of Social Studies Test Material 
(Grades 4-6) 


The students are presented with a simple map of an imaginary 
island on which places are indicated by numbers. 


o 350 700 


Scole of Miles 


They are asked questions such as: 
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1. If explorers came to this island by ship, where would they find 
the safest harbor? 


A2 B4 C9 D10 


2. Which of these places is on a peninsula? 
E4 F6 G7 H9 


(Grades 10-12) 


The students are provided with monthly temperature and rain- 
fall charts for four places. 


90 August 


August 
M (K 


F 7 *December 


8 ka 


Degrees of Temperoture 
= 
o 


o ' 2 3 4 5 6 7 
Inches of Rainfall 
They are asked such questions as: 
1. Which of these cities are north of th 
A Land II only. 
B II and IV only. 


e equator? 
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C All of the cities. 
D None of the cities. 
2. In which city would one need the greatest variety of weights 
of clothing? 
EI FIL GM H IV 


Suggestions for writing multiple-choice items 


The quality of multiple-choice items can be improved by follow- 
ing these suggestions: 


l. Be certain that each item has a central problem. One way to 
test this is to try and phrase the item as an essay item. Unless this 
can be done, there is no central problem. 

2. Be certain that there is only one correct answer, unless you are 
using the variation which permits multiple correct answers. 


3. Be certain that each option is grammatically correct and rele- 
vant to the stem. Test your items by reading the stem followed by 
each possible answer separately. This check will often turn up 
poorly worded responses. 

4. Avoid using phrases lifted directly out of the text. 

5. Avoid having the correct answer longer than the incorrect 
answers. 

6. Avoid writing “negative” questions, those which ask for the 
wrong answers rather than the right one. This type of item can be 
very confusing if included in a test where the student has a “set” 
to look for the correct answer. 


Matching items 


The matching item is a form of the multiple-choice item. It 
differs from the usual multiple-choice item in that a number of 
problems are presented simultaneously together with a number of 
answers, each of which is a possible answer to each of the problems. 
The problems and the answers are usually presented in two parallel 
lists with the problems in the left-hand list and the answers in the 
right-hand list. The student’s job is to match each problem in the 
left-hand list with the correct answer in the right-hand list. The 
example below illustrates the form of the matching question, and 
illustrates also some defects often found in this type of item. 
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Instructions: In the blank in front of each statement in the first 
column place the letter preceding the word or phrase in the 
second column that is most closely related to it. 


1. Largest city in California. A. 1492. 
———2. A president of the United States. B. George Washington. 
—— 3. The year in which Columbus C. Texas. 
discovered America. 
—__4. Largest state in the D. Los Angeles. 


United States. 


This example is a poor one for two reasons. First, the statements 
in the left-hand column have nothing to relate them to each other. 
Because they are so heterogeneous, each item has only one logically 
possible answer in the second column. The second criticism of the 
item is that there are an equal number of problems and answers. 
If each answer can be used only one time, the person who knows 
the answers to all of the problems except one will be able to 
answer the last problem by elimination. 

The following example is free of these two faults. 


—— l. First president of the United States. A. Eisenhower. 
——2. Only United States president to be B. Lincoln. 
elected for four terms. C. F. D. Roosevelt. 
—— 3. President of the United States who D. T. Roosevelt. 
was a five-star general in World E. Truman. 
F. 


War II. . Washington. 
—— 4. President of the United States when 


the slaves were freed. 


The matching item is best used in testing factual knowledge 
such as names, dates, places. The ease with which the items can 
be constructed may lead the teacher to “overtest” on factual infor- 
mation. 


Suggestions for writing matching items 
1. The problems in a matching item should be homogeneous— 
should all be of the same general type. (E.g., dates, names, places.) 


2. There should be more possible answers provided than there 
are problems presented. 


3. Each matching item should be relatively short. If a long list 
of problems is to be presented in matching form, split them into 
two or more items. 
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4. Arrange the possible answers in a logical order (alphabetical, 
chronological, etc.) if such an order exists. In the last example 
above note that the presidents are listed in alphabetical order. 


True-false items 


The true-false item consists of a statement which is to be judged 
either true or false by the student. 

Example: There are two pints in a quart. ab F 
The student responds to a true-false item by choosing either the 
T if he believes the statement is true or the F if he believes the 
statement is false. 

Sometimes the difficulty of the true-false items is increased by 
requiring the pupil to make every false statement true by replacing 
a key word in the sentence. 

Example: There are three pints in a quart. T F 
If the pupil marks the statement false, he is expected to write a 
word in the blank which will make the statement true. In the 
above example, the pupil would be expected to circle the F and 
write the word “two” in the blank. 

True-false statements have had wide use in teacher-made tests. 
The fact that they can be written easily has led some teachers to 
construct them carelessly and use them excessively. Teachers often 
construct true-false tests by “lifting” a number of true statements 
directly from the book, making some of the statements false by 
changing a word or by inserting a negative at a convenient spot. 
Unfortunately, tests constructed in this manner are usually poor 
tests, in that they encourage rote memorization of text material, a 
goal not usually endorsed by the teacher. 

The true-false item should only be used when a simple state- 
ment is either completely true or completely false. Since only a 
small percentage of important items in most areas of learning meet 
this criterion, the number of true-false items which can be used 
in a test is limited. Determining if a statement is 100-percent true 
is sometimes difficult. The pupil faced with the problem of decid- 
ing whether the following statement is correct is in a quandry. 

If three boys divide a dollar among them, each 

will have 33-1/3 cents. T F 
Is it true or is it false? Since $1.00 divided by 3 is 33-1/3 cents, the 
item is true, but since there is no such thing as 1/3 cent in our 
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monetary system, the boys cannot have 33-1/3 cents. Therefore, 
the statement is false. Pupils should not be forced to guess which 
of these interpretations the teacher wishes them to make. An item 
which is partly true and partly false should not be included in a 
true-false test. 

Although the true-false item has received its greatest use in test- 
ing memory for simple facts, it is possible to utilize this item form 
in testing more complex reasoning processes. Certainly the truth 
or falsity of the following item is not determined by recourse to 
rote memorization. 

A box 6” x 8” x 12” contains the same number of 

cubic inches as a box 3” x 16” x 12”, T Ẹ 

The true-false item is one which most teachers will find useful 


if they recognize its limitations and use it sparingly with the follow- 
ing safeguards: 


1l. Use the true-false item form only for items which are either 
100-percent true or 100-percent false. 


2. Avoid writing true-false items in which the statements are 
lifted verbatim from the text. 


Direct question and completion items 


In contrast to the sclection-type items just discussed, supply-type 
items require the student to supply the answer in his own words. 

The two major forms of the supply-type item are the direct 
question and the completion item. The direct question is actually 
a form of the essay item, but it is usually restricted to questions 
which can be answered in a word, a sentence, or a number. The 
following example illustrates this type of item: 

Who discovered America? Š 
This same item can also be written as a completion item as follows: 
America was discovered by. . 
However, in this form the item is ambiguous since the answer “ac- 
cident” is just as correct as “Columbus,” although the item-writer 
probably did not have “accident” in mind as a possible correct 

answer. 

Supply-type items often have more than one correct answer. 
Many words have synonyms and near-synonyms, and it is often 
difficult to determine just when an item has been answered cor- 
rectly. Even mathematical problems sometimes pose a problem for 
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the scorer. For example, if the correct answer to a problem is 5-1/3, 
is 5.33 right, or is it wrong? 

The advantages of the supply-type of item are that they are com- 
pletely free of the effects of guessing and that they motivate pupils 
to learn material to the point where it can be recalled. The major 
disadvantage is the difficulty of determining which answers shall be 


accepted as correct. 

Supply-type items are particularly useful for use in mathematics 
or science, where the results of complex reasoning processes can be 
represented by a few symbols or numbers. 


Suggestions for writing supply-type questions 

1. When possible, use a direct question rather than the comple- 
tion form of item. 

2. Use only questions which can be answered by a unique word, 
phrase, number, or symbol. 

3. Avoid using statements lifted directly out of the book, since 
this tends to overemphasize rote learning. 

4. In computational problems, specify the units in which the 
answer is given and also the degree of precision expected. 


5. Avoid using completion items with too many words omitted. 


General suggestions for writing short-answer test items 


The following suggestions apply to writing all types of short- 
answer test items. 

1. As ideas for test items occur, make a note of them. During the 
ay teaching activities, ideas for good test items will occur to 
Unless notes are made while the idea is fresh in mind, 
vill not be remembered when the time comes 


day-to-d 
the teacher. 
the chances are that it v 
to construct a test. 

2. After a group of test items has been written, some other 
teacher or other person who knows the material well should look 
them over and try to answer them. If another teacher does not 
agree with the answer to a question, there may be something wrong 
with the item. 


3. If someone else cannot be found to criticize the items, they 
may be put aside for a few days, then read again. Often this pro- 
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cedure will reveal ambiguous items which were not apparent at 
first glance. 


4. The reading difficulty of the test items should be kept as low 
as possible, unless the test is to be used to measure the student's 
reading ability. 


Assembling, administering, and scoring the objective test 


A test is a collection of items. Tests can be composed of items 
all of the same type (e.g., true-false test, multiple-choice tests), or 
they can consist of a variety of item types. Including a variety of 
item types is usually preferable in a lengthy test, since it provides 
more flexibility in covering the material. 

If a variety is used, each type should be grouped into a separate 
section of the test. Thus a test might consist of a group of true- 
false items, plus a group of multiple-choice items, plus a group of 
direct-question items. Teachers frequently include both objective 
and essay items on the same tests. If a group of items contains some 
items which are more difficult than the others, it is preferable to 
place these items at the end of the group. 

In assembling the items, care should be taken that the occurrence 
of correct responses follow a random pattern. Avoid a regular pat- 
tern of correct answers. Also avoid having any particular response 
position as the correct answer more frequently than any other 
response position. Thus in a set of four-choice multiple-choice 
items choices 1, 2, 3, and 4 should each be the correct answer 
approximately one fourth of the time. In a set of true-false items 
there should be approximately the same number true as there 
are false. 

Objective tests are usually dittoed or mimeographed so that each 
pupil has a copy of all of the items. Pupils may indicate their 
answers by marking directly on the test or by marking on a separate 
answer sheet. Separate answer sheets should be used only with 
mature students who will not be confused by their use—usually 
junior high school and senior high school students. If answers are 
to be recorded on the test paper, the scoring may be facilitated by 
providing spaces for answers to the items in a column down one side 
of the test. A scoring key can then be laid beside the column, and 
the right and wrong answers easily determined. 

The assembled test should include directions to the pupils. These 
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directions should specify the method to be employed in responding 
to the items (e.g., circle the correct answer; cross out the letter that 
corresponds to the correct answer). The pupil should also be in- 
structed whether to “guess” or not when he is not sure of an answer. 
Formulas have been developed which are designed to correct for 
“guessing.” The use of these formulas is neither necessary nor advis- 
able when sufficient time has been allowed for almost all pupils to 
attempt all of the items, and when pupils have been instructed to 
respond to all items, even if they are not positive of the answer. For 
short-answer teacher-made tests it is recommended: (1) that suffi- 
cient time be allowed for all pupils to attempt all of the items, and 
(2) that pupils be instructed to answer all items on the test. 

The importance of a carefully prepared scoring key is sometimes 
overlooked. This key should be checked and rechecked to be cer- 
tain that it contains no errors. The actual scoring process consists of 
comparing the pupils’ responses with the answer key and indicating 
which are right and which are wrong. The most widely used 
method of scoring is to give one point for each short-answer ques- 
tion answered correctly. Scoring methods that give various weights 
to different items have not proved useful. The total number of 
items right is the students’ “raw score” on the test. The importance 
of accuracy in scoring tests is obvious. Provisions should be made 
for some means of checking scoring, especially on important tests. 
This check could take the form of a rescoring by the teacher, a 
second scoring by an assistant, or reviewing the answers to the test 
during a class period with pupils checking their own responses. 


Analyzing the results of an objective test 


With experienced teachers the testing process does not stop with 
scoring the test and recording the grade; instead the test results 
serve as guides to further teaching and also as means of improving 
the quality of future tests. 


By examining the responses of the class to individual items in the 


test, the teacher can discover items which were missed by many 
members of the class. Investigation can then reveal why the item 
was so difficult, and if some fact, principle, generalization, or other 
objective of instruction has not been learned, the teacher can 
“reteach” the objective. By having pupils explain why they chose 
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the incorrect answer that they did, the teacher often gets insight 
into the nature of the pupils’ difficulties. 

In addition to discovering items which are difficult for the class 
as a whole, the teacher can use an individual’s test paper diagnos- 
tically by finding out what difficulties the individual is having, and 
then working with him to overcome these difficulties. 

Teachers can also use the results of testing to discover important 
information about the quality of their test items. If the test items 
are discussed with the class after the test has been given, items 
which are ambiguous, which have no right answers, or which have 
more than one right answer, may be discovered. The fair teacher 
will not penalize pupils for poor items and will discard obviously 
poor items from the test. 

One method of discussing the test with the class before scoring 
the test is to have pupils mark their answers on both the test itself 
and on a separate answer sheet. The answer sheets are collected 
after the test, while the pupils keep their copies of the test in front 
of them. Then, before the tests are scored, the test items are dis- 
cussed with the class, and pupils are given their chance to comment 
on any of the items on the test which they do not understand. If, 
during this discussion period, any items which are ambiguous or 
otherwise poor are discovered, these items can be omitted from the 
scoring key when the separate answer sheets are scored. Naturally, 
the poor items will be revised before they are used in future tests. 

The clerical work involved in determining how many pupils 
chose each response to each item in an objective test can be reduced 
if this activity is made a part of the discussion of the test. For 
example, when discussing a particular multiple-choice item the 
question may be asked “How many chose the first answer?” “How 
many chose the second answer?” and so on. Before using the items 
in another test it may be desirable to make revisions, replacing 
responses which were not attracting any of the pupils who did not 
know the right answer. By keeping and referring to a file of old 
tests or test items, together with a record of how difficult the items 
were, the teacher can continually improve her own tests. 


EXERCISES 


1, Have a committee chosen from your class devise a 15-minute 
short-answer test covering this chapter. Let the remainder of the 
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class take the test. After the class has taken the test, have a class 
discussion about the quality of each item in the test. 

2. Obtain a teacher-made short-answer test in your field and exam- 
ine it to see how it might be improved. When you find items in 
the test which you think are poor, rewrite them to see if you can 
improve them. 

3. After you read the next chapter compare the short-answer test 
with the essay test. What advantages does the short-answer test 
have over the essay test? What advantages does the essay test 
have over the short-answer test? 

4, Prepare ten multiple-choice test items on vocabulary, for a stated 
grade and subject. 


SUGGESTED ADDITIONAL READINGS 


Adkins, D.C. Construction and Analysis of Achievement Tests. 

Gerberich, J.R. Specimen Objective Test Items. : 

Henry, N.B. (Editor) . Measurement of Understanding. 

Lindquist, E.F. (Editor) Educational Measurement. 

Odell, C.W. How to Improve Classroom Testing. 

Remmers, H.H., and Gage, N.L. Educational Measurement and 
Evaluation. . 

Ross, C.C., and Stanley, j.c. Measurement in Today's Schools. i 

Thorndike, R.L., and Hagen, E. Measurement and Evaluation in 


Psychology and Education. R 
Travers, R.M.W. How to Make Achievement Tests. 


CHAPTER 3 


Evaluating Achievement with 
Teacher-devised Essay Tests 


The essay question is actually a supply-type item. It differs from 
the direct-question type discussed in the preceding chapter in two 
ways. The most important difference lies in the fact that the direct- 
question item used in short-answer tests can be scored either right 
or wrong, whereas the essay item permits answers which vary in 
their degree of rightness. The question, “Who is the president of 
the United States?” has only one correct answer and can therefore 
be considered a form of objective item. The essay item, “Describe 
and explain the duties and powers of the president of the United 
States of America,” permits many different answers which vary 
greatly in the extent to which they are considered correct by the 
person scoring the examination. The second difference between the 
direct-question short-answer item and the essay item is found in 
the length of the answer. Direct questions used as objective items 
usually can be answered in a word or a phrase. Essay items gener- 
ally require considerably longer answers, 


Purposes of essay tests 


Some courses such as English or journalism include among their 
objectives the ability to organize and present material in written 
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form. The most direct and obvious way to measure achievement in 
this area is by means of the essay test or the assigned paper. Since 
papers written outside of the classroom are sometimes produced by 
persons other than those who submit them, the essay test becomes 
the best measure of how well a pupil can handle the English 
language. Essay tests used in English composition classes can 
reveal the ability of the pupil to express himself in an organized 
fashion. They can also be used to obtain evidence regarding the 
pupil’s achievement in grammar, spelling, and handwriting. 

The essay test is also useful in measuring the complex objectives 
of instruction in other courses, such as social studies. Although 
short-answer items can be written to measure complex mental proc- 
esses, they are difficult to construct. For this reason, many teachers 
prefer to utilize essay items to measure their pupils’ ability to 
organize and critically evaluate facts and ideas drawn from broad 
and complicated bodies of subject matter. 


Advantages of the essay test 


The greatest advantage of the essay test is its suitability in meas- 
uring those complex learnings which cannot easily be measured by 
means of short-answer tests. It also has the advantage of encourag- 
ing pupils to study and learn material in large and interrelated 
units rather than as fragmentary and isolated facts. Another advan- 
tage of the essay test is the relative speed with which essay tests can 


be written as compared with short-answer tests. 


„Disadvantages of the essay test 

of the essay test is that the reliability of 
compared with that of short-answer 
t two different persons scoring a set 
on scores assigned to the papers. 
scoring which lead to increased 
his disadvantage need not neces- 


The major criticism 
scoring the test is usually low 
tests. Many studies confirm tha 
of essay tests may differ widely 
There are, however, methods of 
agreement between scorers so that t! 


sarily be a serious one. 3 
The essay test has also been criticized on the grounds that it 


leads to inadequate sampling of material studied. This criticism is 
based on the fact that essay tests sometimes consist of a very small 
number of items. If a pupil does not happen to be well prepared 
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on one of the items, it may lower his score greatly, even though he 
may be well prepared in a majority of the areas which the test was 
supposed to*cover. The pitfall of “limited sampling” can 3 be 
avoided by using a larger number of items, each of which requires 
brief essay responses (a paragraph or two), in preference to a few 
items each requiring long and involved answers. 


Since essay tests are very time-consuming to score, adequate time 
must be budgeted for reading the papers. 


Constructing the essay test 


Some goals can best be evaluated by means of short-answer items 
and other by means of essay items. If a goal can be adequately 
evaluated by means of short-answer items, they should be used in 
preference to essay items. Reserve the essay item to evaluate those 
goals which cannot be easily or adequately measured by short- 
answer items. Questions of a factual nature which require the pupil 
to answer “who,” “what,” “when,” or “where” should be tested by 
use of short-answer items. Essay items are usually reserved for the 
measurement of more complex learnings. Typical of essay items are 
those which require the pupil to “explain,” “compare,” “contrast,” 
“interpret,” “show differences,” or “summarize.” There are many 
more “key words” which are characteristic of essay items. What they 
all have in common is the fact that they require the pupil to demon- 
strate his understanding of what he has learned. 

After determining which goals require the use of essay items, 
make a brief outline of the content to be covered by the items. Then 
determine the relative importance of the v 
tent, and decide which 
test items. Finally, write the items. Usually a better test will result 
from the use of a relatively large number of short essay items, 
rather than just a few long items. 

Items should usually be specific enough so that the pupils will not 
have to guess the nature of the expected answer. The item “Discuss 
the United Nations” is so broad that the only possible justification 
for its use would be to see what aspects of the United Nations 
seemed to be important to the individual pupils. Many more 
specific questions could be written to elicit pupils’ knowledge about 
certain aspects of the United Nations. For example, “Discuss the 
purposes for organizing the United Nations.” “On what grounds has 


arious parts of the con- 


aspects of the content to include in the: 
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the United States of America opposed the admission of Communist 
China into the United Nations?” and so on. 

In general, the essay items should be as specific as possible, as 
long as the specificity does not defeat the intended purpose of the 
item. Every effort should be made to make the item clear and 
unabiguous, so that each pupil will have the same understanding 
of what kind of an answer is expected. As with short-answer items, 
essay items can profit from previous trial on persons who know 
something about the material being tested. 

The practice of allowing pupils to select the items they wish 
to answer from a longer list of items is open to serious question. 


When this practice is followed, it becomes impossible to compare 
arious pupils with each other. Thus, one 


an examination, comparison of pupil 
Il students should be required to run 
hen essay items are 


the performances of the v 
of the usual purposes of 
achievement, is defeated. A 
the same race (answer the same questions) W. 
used to measure common learnings. 


Scoring the essay test 


There are two basically different methods of scoring essay items— 


the analytical method and the sorting method. , 
The analytical method consists of constructing a model answer, 
analyzing this answer into a number of separate elements, and as- 
signing some arbitrary number of points to each ig the elements. 
After the model answer is constructed, each pupil's answer is com- 
pared with this answer and assigned points according to whether the 
answer contains the elements included in the model answer. i 
The sorting method consists of reading the answer to each ques- 
tion as a whole, without detailed analysis of the points or elements 
which it contains, and deciding on its over-all quality. After = 
question is read, the paper is placed into one of five piles ee 
“Superior,” “Above average,” “Average,” Below averagé; chs 
“Poor.” This process is continu all of the papers have icon 
sorted into one of the five piles. Then the papers in each pile are 
reread to be sure that they have been properly allocated, and any 
changes which seem indicated are made. Finally, a eo A 
is assigned to each of the piles, and each paper in that pus receives 
that score. The number 5 can be assigned to the Superior 
answers, 4 to the “Above average” answers, and so on with “Poor 


ed until 
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receiving a score of 1. The score of 0 can be reserved for complete 
failure to answer the question. This process is repeated for each 
of the questions on the test, and the total score on the test consists 
of the sum of the numbers assigned to the individual items. 

In using either the analytical method or the sorting method 


there are several suggestions which will improve the reliability of 
scoring. 


1. Before scoring the papers, devise a model answer. Decide how 
much weight should be placed on content, and how much on organ- 
ization. Unless handwriting, spelling, grammatical usage, etc., are 
included in the objectives to be evaluated by a test, ignore these 
aspects of the answer in reading the papers. If any of these aspects 
are to be included in the over-all score, the paper should be read 
separately for spelling, grammar, etc., and a separate score assigned 


for performance in these areas. This score can be added into the 
the total score if desired. 


2. Read the questions anonymously—do not look at the name of 
the person who wrote the test until after the papers are scored. 
Knowledge of who wrote the answer can unfairly influence the test 


scorer. Anonymity can be secured by having students write their 
names on the backs of the examination papers where they will not 
be seen. 


3. Score only one question at a time. If the test consists of more 


than one item, as it usually will, first score item one on all papers. 
Then score item two on all papers, and so forth, until all of the 
items have been scored. 


EXERCISES 
1. Can you think of any possible use for a question like, “Discuss 
Shakespeare?” Prepare two or three short essay questions which 
might be used to measure an eleventh-grade student’s knowledge 
of Shakspeare and his works. 


2. Draw a parallel between the items on an €ssay test and the 
separate events in the decathalon. Are the athlet, 
any choice in the events to be included in the decat 
would happen if each athlete were permitted to 
any ten events of his own choice? What would h: 


€s permitted 
halon? What 
Participate in 
appen if each 
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student were allowed to select the test questions he wants to 
answer? 

3. Define an objective in your field which can be tested by means 
of an essay item. Devise an essay item to measure this objective. 
Write a model answer for the item. 


4. List as many “key words” as you can which characterize essay 


oe nu 


` . ” 
Items. (e.g., “explain,” “compare, contrast”) 


5. What is meant by the term “limited sampling,” and how is it 
related to essay tests? 


SUGGESTED ADDITIONAL READINGS 
Lindquist, E.F. (Editor). Educational Measurement. 
Rees mE and eae N.L. Educational Measurement and 
Evaluation. , $ 
Ross, C.C., and Stanley, J.C. Measurement in Today’s Schools. 


CHAPTER 4 


Evaluating Achievement Through 
Products and Performances 


Although many objectives of instruction can be evaluated by the 
use of paper-and-pencil tests, many other objectives cannot be 
evaluated in this manner. Paper-and-pencil tests can be used to 
determine whether a student knows the rules of baseball, the cor- 
rect temperature to bake a cake, or the approved method of joining 
two pieces of wood. However, the fact that the pupil can give the 
correct answer to questions about baseball, baking, or woodwork- 
ing is no guarantee that he can actually play baseball, bake a cake, 


or make a bookcase. Since many objectives of instruction actually 
call for the pupil to be able to do something, rather than just 
answer questions about doing it, it becomes necessary to have 
methods to evaluate the “doing.” 

Frequently the act of doing somethin 


8 produces an end product, 
such as a cake or a bookcase, which c: 


; an be evaluated after it has 
been completed. At other times there is no end product involved 


in the pupil’s activity, and the teacher can only evaluate the act 
while it is being performed. Such activities as giving an oral 
report or performing on a musical instrument 
manent product and must be evaluated while they are being per- 
formed. In some cases the teacher has the choice of evaluating 
either the end product or the performance. For example, the pupil 


produce no per- 
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baki k 
eas Peer te aig be evaluated on the quality of the cake after 
making the say or on the procedures which were followed in 
correctly, minin mang correct ingredients, measuring ingredients 
perature, etc.) Raa ients properly, using correct baking tem- 
ena D odüct. oe t times the teacher may want to evaluate both the 
atthae end es en one exists, and the procedures used in arriving 

The basi en uct. 
a T INS A between evaluating a product and evaluating 
convenance A that the product can be evaluated at the teacher's 
othes hand mn im be examined at length. Performances, on the 
mot haves ie i e evaluated “on the run, and the teacher does 

‘As with vie chance to correct the evaluation. 3 
er a perform evaluation, the first steps in evaluating 
he iiecives en consist of formulating the objectives, 
providin eee into specific measurable pupil behaviors, and then 
cmt ey nea in which these behaviors can occur and can be 
Gane te t this point the teacher must have some device which 
lias Seer to record and measure the behavior. Unless the teacher 
Ae ch y analyzed the behaviors which are expected from the 
nee » S 5 can make only a crude over-all evaluation of the prod- 
ai deea ormance. This evaluation may not be based on a con- 
ssi n of the attainment of the important objectives of the 
assignment, 

Since the general method of eva 


a product 
translating 


penoa luating products or performances 
cedure ne y similar regardless of the particular product or pro- 
tell ee he evaluated, the evaluation of a written report will be 
sie 3 ustrate the method. Assume that a teacher has given a 
aie en e social studies class the assignment: W rite a report on 
tation the South American countries, covering briefly the popu- 
Cing economy, geography, and political organization of the 
ies ry.” After the class has completed writing their reports and 

as turned them in, the teacher is faced with the problem of 
evaluating the reports. One method of evaluating the reports 
would be to scan them one by one and assign a letter grade to 
cach. Under this method of evaluation papers which contain very 
little factual material but which are neatly typed often receive 
higher grades than those which have more factual material but are 
handwritten. In similar fashion, a pupil who has done a good deal 
of research may have his grade severely lowered because of spelling 
mistakes even though spelling may not be a major objective of the 
Particular assignment being evaluated. In order to keep in mind 
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just what is being evaluated in a 


f aes F A i : its 
of a list of objectives is needed to guide the evaluation. In 
simplest form this list is called a 


check list and consists of ber. 
characteristics of the assignment which are marked as being ree 
or present. The following check list illustrates the kinds of eiaa 
teristics which might be included in evaluating a paper such 
that assigned in the illustration. 


particular assignment, some kin! 


Specimen Check List for Evaluating a Written Report 
Yes No 

- Well-organized 

- Based on research 

- Covers topic adequately 

. Stays within assigned topic 

- Free of spelling errors 

. Free of grammatical errors 

- Neat writing or typing 

(plus other items) 


NA oH oD 


MTT 
| 


The items to be inclu 
from teacher to teac 
check list is the sim 
of certain qualities 

Sometimes the te 
list, e 


ded in a list of this type will naturally an 
her and from assignment to assignment. ae 
plest way of recording the presence or ai 
of a product or procedure being evaluate pay 
acher will not be willing to use a simple co A 
specially when some of the items to be evaluated S By 
varying quantities rather than merely being present or absit ae 
providing more than two options for each item in the ist, 3” 
teacher is better able to reflect the varying degrees of goodne: 3 
of the characteristics being evaluated. Instead of using the chec 
list illustrated above, the teacher might want to allow for oe 
refined evaluation of the eres of Peer Ses 
isti -point scale where anding, 
Ret ear aver unsatisfactory. With this variation the 


device is normally called a rating scale and would appear as 
follows: 


Unsatisfactory Average Outstanding 
1. Organization 1 2 2 4 5 
2. Research 1 2 4 A 5 
3. Coverage 1 2 : 3 5 
4. Stays within 1 2 5 


assigned topic 
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Unsatisfacto: Average Outstandin; 
5. Freedom from 1 y 2 3 5 4 5 x 
spelling errors 
6. Freedom from 1 2 3 4 if 
grammatical errors 
7. Quality of writing 1 2 3 4 5 
or typing 


(plus other items) 


Ta R the appropriate number for each characteristic to be 
sec e teacher expresses her judgment of the quality of that 
A a characteristic. The over-all rating for the paper may be 
oe. by adding the points for the separate characteristics to 
ing a pa score for the paper as a whole. 

hes Hh the check list and rating scale illustrated above, no siort 
Usual] en made to weight the various characteristics evaluate i 
ated ai iam of the characteristics of the assignment being evalu- 
t aits be considered to be more important than other charac- 

tics. If this be the case, the rating scale just considered can be 
Modified to allow for this fact. Suppose that in the previous 
example the first four characteristics were considered more impor- 


tant than the last three characteristics. This difference in impor- 


tance could be reflected in assigning more possible points to each of 
the remaining three. Such 


t aes 
p first four characteristics and fewer to 
rating scale might look like this: 


Unsatisfactory Average Outstanding 

l. Organization 1 2 3 4 5 

2. Research 1 2 3 4 5 

3. Coverage 1 2 3 4 5 

s Stays within topic 1 2 3 4 5 

5. Freedom from 1 2 3 
spelling errors 

6. Freedom from 1 2 3 
grammatical errors 

7. Quality of writing 1 2 3 


or typing 
(plus other items) 
hich can be earned for each 
luated, weighting can be 
n the proper amount of 


By varying the number of points w 
characteristic of the assignment eva 
achieved to conform to the teachers’ ideas o 
importance to be given to each characteristic. 


s 
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Most teachers will add the points awarded to each of the sepa A 
characteristics to obtain a total number of points for the pe 
ment. Then they convert the total points into a letter grade wA 
can be recorded in a record book and which can be incor Por 
into an over-all grade for the semester. Even though this r the 
the detailed evaluation of the product or procedure provided Hae 
check list or rating scales is more informative to the saa aa 
the over-all grade and should be given to the pupil so that he eR 
see what his strong points and weak points were on the ane E to 

Another example of a rating scale is the following one rate 
evaluate an original geometric design drawn with a pon A 
and compass. Similar scales could be devised for use in assess 


ule ANS nie á ki lass, OY 
painting or ceramic piece in art, a table in a woodworking class, 
a skirt pattern in homemaking. 


Rating Scale for Original Design in Geometry 


The design: 
1. Shows originality 1 


2 3 4 z 
2. Exhibits the stipu- ] 2 3 4 4 
lated directions as 
to size, type of 
paper, etc. 5 
3. Exhibits skillful use 1 2 3 = 
of tools 
cepa 4 Hi 
4. Utilizes Principles of — | 2 3 
geometric construc- 
tion 5 
my ist 4 
5. Exhibits accepted l 2 3 ‘ 
characteristic of 
Bood design ey 5 
6. Demonstrates the ex- 1 2 3 


penditure of effort 


1 = The design is 


poor in this respect. 
2 = The design is 


below average in this respect. 
3 = The design is average in this respect. 3 

4 = The design is above average in this respect. 
5 = The design is superior in this respect. 


ve useful in devising and 
The following suggestions should aiak ting products and pro- 
using rating scales or check lists for evalua 
cedures, 
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a 


y; ig tates A measurable characteristics in the list. 
a — of characteristics as short as possible while 
iF some oan he imporlant characteristics to be evaluated. 
ne fone aracteristics are more important than others, pro- 
i or this by assigning more “possible points” to the more 
4 eon ones. 
i ee check list or rating scales, beware of the “halo” 
ae + his error is made when the teacher assigns points on 
Separate characteristics on the basis of some over-all im- 
et of the product or procedure, or even of the student 
© produced it. Thus, an average product produced by a 
pupil whom the teacher regards as being a good student 
i receive a higher score than a good product produced 
y a pupil whom the teacher thinks of as being an average 
student. 


po 


EXERCISES 


List five products which might be dev 
pupils in your field. 

List five performances in your field which 
and which require evaluation. 

Develop a check list for evaluating a prod 
In your teaching area. 

Develop a rating scale for evaluating a prod 
1n your teaching area. 

Obtain a check list or rating scale pr 
level of education. Examine the chec 
if you can suggest any improvements. 
As a class project, develop a check list or rating scale to evaluate 
a product or performance assigned in the class in which you are 
Studying this text. 


eloped or constructed by 
h relate to achievement 
uct or a performance 
uct or a performance 


esently being used at some 
k list or rating scale to see 


SUGGESTED ADDITIONAL READINGS 


Micheels, W.J., and Karnes, M.R. Measuring Educational Achieve- 


ment. 


Remmers, H.H., and Gage, N.L. Educational Measurement and 


Evaluation. 


Thomas, R.M. Judging Student Progress. 


CHAPTER 5 


Evaluating Typical Behavior with 
Teacher-devised Instruments 


Meaning of typical behavior 


In the preceding chapters methods of measuring pupil achieve- 
ment haye been discussed. These methods are used when the 
teacher wishes to determine what the student can do when he is 
trying to do his best. It is assumed that pupils will try to get the 
best scores they can on tests or on assigned papers. In addition to 
these measures of “best” behavior, the school is often interested in 
evaluating the pupils’ customary or typical behavior. The differ- 
ence between “best” behavior and typical behavior can be illus- 
trated by considering the problem of evaluating a boy’s ability to 
drive a car. If he is given an examination for the purpose of 
granting him a driver's license, there is little doubt that he will try 
his best to obey all of the rules and to handle the car with care. 
This is an example of test behavior. Even though the boy passes the 
driving test with a high score, there is no assurance that he will 
customarily or typically drive with the same degree of skill or 
caution. 

The specific typical behaviors which the teacher will be interested 
in evaluating will be determined by the specific objectives of the 
school and of the teacher. Generally, the behaviors evaluated will 
fall into the areas of personal and social behavior. Such characteris- 
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tics . oo s ao 3 
aa as work habits, ability to work with others, and citizenship are 

ong those typical behaviors which are evaluated in many class- 
rooms today. 


Evaluating typical behavior through observation 


Rent principal method utilized in evaluating typical behavior is 
aan aa Teachers spend much of their time observing their 
ite ts. If the teacher knows what she is seeking as she observes 
T pupils, she can obtain much valuable information about them. 
is zeae ions may be either planned or informal. When the teacher 
ened observing selected pupils for specified behaviors, the 
the x eat are planned. Informal observation takes place when 
u server notices, and eventually records, behaviors which arise 
nexpectedly. Whether planned or informal, observations should 
fo, recorded. A check list or rating scale may be utilized to record 
ee observations. For recording observations about the work 
abits of a pupil a check list like this might be utilized: 


Yes No 


Begins work without delay 

Has necessary books and supplies 

Continues work without unnecessary interruptions 

Asks questions if he does not understand assign- 
ment 


n would be a rating 


Another method of recording the observatio. 
would be: 


scale. One form of scale for rating work habits 
E 2 3 4 5 


Does not do assigned work Starts to work immediately 
Does not have necessary books Has necessary books and sup- 
or supplies plies A 
Interrupts other pupils Continues to work steadily 
Does not ask questions if Asks questions if necessary 


necessary 


would make a mark on the scale to 


To use this scale, the teacher S 
Five would indicate the 


indicate the work habits of the pupil. 
best work habits and one the poorest. 


In the rating scale above, only the two extremes of behavior are 
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defined. Additional refinement can be obtained by similiarly defin- 
ing the three graduations between the extremes. s ET 
A slightly different approach to recording typical behavior, w Te 
is used in special instances, is that referred to as time ampere 4 
study a particular pupil, an observer may observe his behavior du 


ing a specified period of time. A report of this kind might read 
as follows: 


10:30 John comes into the room, sits down quietly, and after 
inspecting several books, chooses his history text. 

10:31 Begins to study. 

10:3114 Speaks to his neighbor. 

10:32 Returns to his studying. 

10:33 Sharpens his pencil. 

10:3314 Returns to seat and stares out of the window. 


Obviously, the regular teacher in the classroom cannot carry es 
much of this type of observation, although the value of this rep 
for diagnosing special problems is obvious. 4 ires that a 
Evaluation of an objective through observation requires E the 
sufficiently large sample of behavior be included. Thus, : ae 
teacher were evaluating work habits she should be sure to obse her 
each of the pupils on many different occasions during the Reet ar 
to be certain that what she has observed was typical behavior rat 
than atypical behavior. - i 
By spending a few minutes each day in phren ee iy 
behavior of a few pupils, and by observing different CRESE an if 
the teacher can soon get to know her pupils much T When 
she does not systematically observe them as Komi ` f the 
classes contain between thirty and forty children, many o tion 
children will go unnoticed unless the teacher focuses her yee of 
on them. Of course, the problem child always receives his s poe 
attention and notice, but the majority of the class may SaS Son 
as individuals. Then, when the time arrives for a formal A 
of a pupil’s behavior or a discussion with his parent, aE Bout 
is suddenly faced with the fact that she really knows very 
the child’s typical behavior in the classroom. 


Informal observation: The anecdotal record 


" r i ual account of an 
An anecdotal record consists of a simple, fact T be riten 
observed incident. As a general rule anecdotes s 
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nica eee (1) to record unusual incidents, and (2) to 
always ie — pupils. Unusual incidents which occur should 
adjusted genie ed, even though the pupils involved may be well- 
BO aaa ivi npm These anectodes become part of the pupil’s 
a.a a ept in a folder along with other information per- 
of this ‘oe uating the child’s behavior. The availability of data 
pupils b ure places the teacher in a position to understand her 
F etter and to report their school behavior more intelligently. 
R Ei pupils who are emotionally disturbed, who fail to get 
the _ other children, or who have other behavior problems, 
fie te c ia should use anecdotal records extensively. In these cases, 
Ta : her might follow these students through a day or week, 
a g reports at regular times. Another system involves writing 
ecdotal reports on these pupils for every activity during a short 
penod These reports are very helpful when, with the principal, 
en, counselor, or school psychologist, a plan of action is 
eveloped to help the child with his adjustment problems. 
r Anecdotes should be restricted to an account of the actual behav- 
iors observed. Judgments of the rightness or wrongness of the 
behavior should not be included. If the teacher has an opinion as 


to the cause of the behavior, such opinion may be included but 
should be clearly labeled as teacher opinion rather than actual 


observed behavior. 
The following anecdote is acceptabl 
reporting observed behaviors: 


e because it confines itself to 


Date: May 7, 1957 
okie from Mary S. When she 
I separated them and told 
When I asked him why he 


Name: Jonn Dor 

At nutrition John grabbed a co 
tried to get it back, he pushed her. 
John to return the cookie. He did so. 
had taken the cookie, he would not answer. 


A poor anecdote based on this same jncident might read like this: 


Name: Joun Dor Date: May 7, 1957 


John caused a commotion at n 
trouble-maker. It is his parents’ 
Manners. 


utrition today. He is a chronic 
fault for not teaching him better 


lected as an end in themselves. Un- 


Anecdotes should not be col 
the time taken to 


less the anecdotes are actually used in some way, 
write them is wasted. 
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Obtaining information from the informal reports of pupils 


In trying to understand their pupils, teachers often need informa- 
tion about a child’s interests, how he spends his out-of-school time, 
and other data which cannot be acquired by direct observation. 
This type of information may be obtained in an informal manner 
from the students by talking with them, or by having them give oral 
or written reports containing the type of material that the teacher 
is seeking. Examples of such reports include such assignments as, 
“My autobiography,” “My favorite hobby,” and “How I would 
Spend One Hundred Dollars.” These topics can be assigned for 
the purpose of providing practice in written or oral expression and 
the content of the reports can be of great assistance to the teacher. 


Obtaining information about pupils from their peers 


In addition to obtaining information about the pupil directly 
from the pupil, it is also possible to obtain information about 
pupils from their peers—the other children in the class. Sometimes 
information can be obtained about pupils by discreet questioning of 
their classmates. Teachers should use this procedure with caution. 

Information relative to the social structure of the entire class 
can be gained by using the methods of sociometry, Although skill- 
ful use of sociometric techniques requires additional training and 
experience, the basic idea is simple. Pupils are asked to list the two 
or three pupils with whom they would most like to serve on a com- 
mittee, play a game, or participate in other similar activities. With 
this information it is possible to compose a graphic picture of the 
social relationships in a class, and by tabulating the number of 
times each pupil was chosen by the others in the class, the teacher 
can readily determine the pupils most popular with their classmates 
as well as those not chosen by any of their classmates. Utilization of 
these techniques enables the teacher in a short time to gather con- 
siderable information about the interrelationships of the various 
members of her class. 

The graphic picture of the social relationships in a class is called 
a sociogram. The sample sociogram on page 55 shows the patterns 
of choice for a group of ten pupils. Each pupil was asked to name 
two other pupils that he or she would prefer to serve with on a 
social studies committee and to identify them by first choice and 
second choice. 
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<+—> = Mutual choice 
=Boy 1 or 2=First or second choice 
—>=One-way choice 


O =Girl 
E 


A Sample Sociogram 


Sociograms are measures of social acceptance and as such can be 
used by the teacher in the following ways: (1) to identify isolates 
who can then be helped in building improved relationships with 
other children, (2) to identify groups and cliques, which because 
of race, religion, or status need to be absorbed into the class in 
order for effective group action to take place, (3) to establish a 
basis for separating a class into groups of pupils who can be 
expected to work well together. 

The “guess-who” technique is another means of collecting data 
about pupils from their peers. In the “guess-who” procedure, 
pupils are given statements describing types of behavior and are 
asked to designate other pupils who best fit these descriptions. 
Statements related to such traits as the following might be included 
in a “guess-who” questionnaire: neat and clean; takes care of public 
property; often gets angry; obeys orders; nobody likes very much; 
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or, good at baseball. A sample “guess-who” questionnaire follows. 


You and Your Classmates—A Guess-who Questionnaire’ 


Name Date 


- Which children sit very still and quiet? 


- Which children wiggle a lot and can’t sit still? 


. Who are the ones everyone likes?. 
. Who are the ones nobody likes very much? 


. Which children are always smiling and laughing?. 


. Which children don’t smile very much and seem sort of sad?. 
. Which children are bossy? 
. Which children let the other children boss them? ___ 
. I would like best to work with 
. Which children are most bashful?. 
11. Which children aren’t the least bit bashful? 


oonga ht on = 


S 


12. Which children are the best at outdoor games?. 
13. Which children aren’t very good at games? 
14. Which children get mad the easiest? 


15. Which ones don’t get angry much? 


16. I would like most to be like 
17. I would like to have. 


for class president m = 


In order to keep choices from being forced, pupils should be 
informed that it is not necessary to name a pupil for each item if no 
one in the class fits the description. Accordingly, items should be 
included in the questionnaire which do not refer to specific indi- 
viduals. 

The information collected with a device of this kind is useful 
in countless school situations. ‘“Guess-who” questionnaires have 
been developed to collect data for guidance, for identifying values 
important to children, and to indicate the degree of social accept- 
ance of groups and individuals. 


* Reproduced with permission of Los Angeles County Schools Office. 
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EXERCISES 


Differentiate between typical behavior and test behavior using 
college students’ grammar as an example. 

Develop a check list to be used in observing courtesy in a sixth- 
grade class. 

Develop a rating scale to be used in observing health habits for 
a grade level of your choice. 

Develop a rating scale to be used in observing habits of critical 
thinking. Identify the class and grade level where the scale is to 
be used. 

Arrange to have two members of your class act out some incident 
that might occur in a classroom. Have the other members of the 
class write an anecdotal record about the incident. Compare 
and discuss the anecdotal records prepared by various members 
of the class. 

Develop a “guess-who” questionnaire after identifying objectives 
and grade level. 


SUGGESTED ADDITIONAL READINGS 


Magnuson, H.W., et al. Evaluating Pupil Progress. 
Staff, Division on Child Development, American Council on Educa- 


tion. Helping Teachers Understand Children. 


Thomas, R.M. Judging Student Progress. 
Torgerson, T.L., and Adams, G.S. Measurement and Evaluation. 


CHAPTER 6 


Summarizing and Reporting Pupil 
Achievement and Typical Behavior 


Among the more complicated tasks the teacher undertakes is that 
of “marking” or “grading.” Most teachers would be relieved to be 
free of the necessity to mark or grade their pupils, but some kind of 
report is necessary for several reasons. 

First, parents want and have a right to know how their children 
are progressing in school. Second, some sort of permanent record 
is needed by school personnel to aid them in proper placement, 
promotion, and guidance of pupils and to provide the data neces- 
sary to evaluate a pupil's achievement when prospective employers 
or colleges request information. Third, the classroom performance 
of the typical pupil is influenced, at least to some degree, by marks 
or grades reported. Usually summary marks or grades are not 
needed by the pupil for the purpose of knowing how he is doing in 
any given subject. His day-to-day successes and failures reveal to 
him with a fair degree of precision his progress or lack of it. Were 
periodic reports given only for the pupil's use, they might be 
eliminated. 


The following are the most widely used methods of reporting to 
parents. 
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Parent-teacher conferences 


Face-to-face meetings between teachers and parents have been 
most widely used at the elementary school level. Such meetings pro- 
vide an excellent opportunity for the teacher to convey informa- 
tion regarding the pupil’s achievement and typical behavior to the 
parent. The parent finds out about the nature and purpose of 
classroom activities. The teacher obtains more information about 
the pupil’s out-of-school behavior and environment, which is often 
helpful in understanding the pupil. In spite of the advantages of 
the method, there are certain practical difficulties which must be 
overcome. These meetings are time-consuming, especially at the 
secondary level, where a teacher may teach as many as 150 to 200 
different pupils during one semester. Also, getting parents to come 
to school for the conference sometimes can be difficult. Not all 
teachers are skilled at face-to-face reporting and therefore may be 
ineffectual in parent-teacher conferences. Unless thought and effort 
are used, parent-teacher conferences can become stereotyped, with 
the teacher using a few pat phrases to describe the achievements 
and behavior of her pupils. If the difficulties inherent in the 
method can be resolved, the parent-teacher conference is a most 
desirable means of reporting to parents. 


Letters to parents 


The next most flexible way of reporting to parents is the unstruc- 
tured letter. This method of communication allows comments on 
any aspects of the pupil’s achievements and behaviors which should 
be reported to the parents. Writing good letters which are truly 

individual and which convey the exact meaning intended is a diffi- 
cult task. There is a wide variation in teachers’ ability to write 
letters of this type. Letters are time-consuming, particularly if 
clerical help is not available. But assuming that the time is avail- 
able and the ability to write good letters is present, this means of 
report can be very effective. As with the conference method, the 
“letter to parents” is most likely to be used in the elementary school. 


Check lists 


More structured than the informal letter is the check-list type of 
report listing a number of descriptive phrases which can be checked 
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to indicate that the phrase applies to the pupil. Such check lists 

tend to be rather long with many specific judgments called for by 

the teacher. Such phrases as the following might be included in 

that portion of a check list which describes the pupil’s achievement 

in reading. 
l. Reads with understanding. 
2. Reads well aloud. 


3. Handles new words efficiently. 
4. Reads independently. 


Since there are many different objectives to be evaluated at most 
grade levels, and since each objective involves. the use of one or 
more phrases on the check list, most lists tend to be very long. 
Although the check-list type of report can provide highly informa- 
ive i i e parent, the large number of different GHEE 
parents confused. Check lists are more widely 
ry rather than the secondary schools. 


sometimes leaves the 
used in the elementa: 


Report cards 


U means unsatisfactory. Many other. 


sets of symbols are used to Teport achievement. There is no reason 


why a school should not adopt any set of symbols for use in report- 
ing as long as the meanings of the symbols are clearly understood 
by all who use them. Few report cards today consist exclusively of 
marks in subject matter. Almost all cards include some provision 
for reporting such typical behaviors as “citizenship,” “work habits, 
“effort,” or the like. 

Although the traditional report card is a much simpler method 
of reporting than any of the other methods mentioned thus far, it 
is also the one which provides the least information for the parent. 
This difficulty arises from the fact that the marks usually are poorly 
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defined and no one knows exactly what a given mark means except 
the person who gave it. 

Many modern report cards combine elements of the traditional 
report card and the check list to provide a more flexible means of 
communication. 

The progress report reproduced on pages 62-65 is a good example 
of a combination report card and check list. Notice especially the 
variety of educational objectives that can be evaluated, and the 
fact that growth in knowledge and skills can be reported in terms 
of position within the class group and in terms of the pupil’s own 
ability to achieve. 


Cumulative records 


In addition to reporting pupil achievement and typical behavior 
to parents, almost all schools maintain some type of cumulative 
record in which the marks assigned to pupils throughout their 
school years are recorded. Such cumulative records usually contain 
in addition information regarding pupils’ attendance, records of 
standardized test results, anecdotal records, health data, records of 
participation in school and extracurricular activities, and data 
about the home and family and about pupils’ interests and objec- 
tives. By referring to these cumulative records the teacher can 
quickly obtain much important information which will help her 
in understanding her pupils. These records contain the necessary 
material to answer questions from employers or other schools and 
colleges. £ 


Basis for assigning grades 


The “A” Johnny brings home in reading, can have a variety of 
meanings. It can mean that Johnny is reading consistently above 
his grade level. It can mean that Johnny is one of the better 
readers in his class even though not reading above grade level. It 
can mean that Johnny is reading as well as the teacher thinks he 
can, even though he may actually be one of the poorer readers 
in the room in terms of absolute achievement. The grade in reading 
might even be based on the fact that Johnny is a “good” boy and 
never causes the teacher any trouble in class. Teachers often mis- 
takenly give high grades to children who are “good” in class 


A Modern Progress Report 


ROGRESS REPORT 


eee FIFTH AND SIXTH GRADE 
ED MESSAGE TO PARENTS 


The stoff of the Rivera Elemen: 
tary School District believes thot 
the education of the pupil is a 
cooperative enterprise in which the 
home and school should work close- 
ly together, The school strives to 
help the child develop those skills 
ond ottitudes necessary for a de- 
sirable citizen in our democracy. 

This report with individual par- 
ent-teacher conferences should give 
the home a picture of the child’s 
progress, 


Very truly yours, 


Ce R. Steed 


District Superintendent 


Principal 


Teacher 
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Reproduced by permission of Eli R. Steed, Superintendent, 
Rivera School District, Rivera, California. 
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ATTITUDES AND BEHAVIOR 


Almost always 
Part of time 
Very seldom 
Almost always 
Part of time 
Very seldom 
Almost always 
Part of time 


Work and Study Habits 


Listens attentively 


Follows directions 


Uses time wisely 


Does neot and coreful work 


Begins ond finishes work on 
time 


Speoks only In tum 
Social Development 
Gets along well with others 


Accepts responsibility 


Respects rights and property 
of others 


Recognizes and solves own 
problems 


Is courteous 


Is developing self control 


Health ond Safety 


Practices cleanliness 


Maintains good sitting and 
stonding posture 


Obeys rules and regulations 


Days Absent Height and Weight 


Nov. Mar. Nev. June 


C TAE a 


Conference 
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FOURTH, FIFTH AND SIXTH GRADE 
None ~= l 


GROWTH IN KNOWLEDGE AND SKILLS 


Explanation of morks in 
=eanetion of marks in_ 


Achievement level (based on child's Marks in this column show the rela- 
Ridaed ayn his, closs group, os tionship between o hilos achiever 
prs- standor: fests, nao Tiker 
ffsmode tests, ond teacher Sbiera. forte his ability to, ae 

ion. 


1—Commendable, is doing un- 
usually good work in terms of 
his own, ability 


2—Satisfactory, making progress 
consistent with his ability 


leeds to improve, progress 
Desirable growth is listed below not consistent with ‘his ability 
Egh sublect. You will find an “Ne 
where your child needs to 


Nov. Mar. June 


fel neal COO 


Reods with understanding 

Applies phonetic understandings 

Uses dictionary skills 

Shows interest in independent 
reading 

Reads well orally 


Finishes reading assignments 
Arithmetic 


Understands processes taught 
Works accurately 


Solves word problems 


Language 


Expresses ideos effectively in 
‘Speaking 


Expresses ideos effectively in 
writing 


U: k uage fundomentols 
“correctly 


Spelling 


Spells carefully in oll written 
work 


Oo 
oo 
oo 
o0 


Knows words in spelling list 
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Social Studies Nov. Mar. June 


History & Geogrephy of O O O 


— Understonds and f 
social studies as forte of 


— Contributes to group octivities 


— Interprets globes, mops, end 
charts 


Science DIOE) 


Shows growth in scientific facts 
and understandings 


Uses scientific method in solving 
problems 


Handwriting OOO 


Uses neat ond legible 
handwriting 


Music DOO 


Responds to ond enjoys music 


Shows growth in musical 
activities 


Ar © 


Shows growth in ability to 
‘express ideas creatively 


Shows interest in and enthusiosm 
for art 


Uses art materials with core ond 
understanding 


Physical Education 
Shows Lert in good 


‘Sportsmanshi 
Grows in eee skills 


RECOMMENDED PLACEMENT... 
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behavior even though they may not be equally good in their 
achievement. It is obvious then that the “A” on Johnny’s report 
card is meaningless until the method used in determining the grade 
is known. 4 
There are three main ways of assessing achievement—(1) in 
relation to absolute standards, (2) in relation to the performance 
of the other pupils in the same class or grade, and (3) in relation 
to the pupil’s own ability. The differences between these can be 
illustrated by considering the case of the pupil in the fifth grade 
who has an 1.Q. of 85 and whose reading grade placement is 4.5, 
which means he reads as well as the average child in the fifth month 
of the fourth year. If this child were graded on the basis of absolute 
standards, he would probably receive a C or D in reading. If he 
were a member of a slow class, he might be one of the better 
members of the class and, if graded on his performance relative 
to that of his classmates, might receive a B. If he were graded on 
the basis of his achievement in relationship to his own ability he 
might even receive an A. Thus, depending on the method of 


grading used, this pupil might receive any grade from A to D for 
the same work. When each teacher in a school independently 
decides what a grade should mean and on what basis it should be 
assigned, 


the grades become meaningless. It is essential that 
throughout a school system there be as much agreement as possible 
on the basis for grading and on the meaning of the symbols used in 
reporting grades. 

In the preceding example there were two facts which should 
have been conveyed to the parent. (1) The pupil was reading as 
well as could be expected in terms of his ability, and (2) he was 
reading below the average child in his grade. No single grade 
can convey both of these pieces of information. It is, of course, 
possible to report the two separately. Any system of assigning 
grades which will convey both of these facts is superior to a single- 
symbol system. The addition of a series of check-list phrases which 
would enable the teacher to indicate the strengths and weaknesses 
of the pupil in various aspects of reading would strengthen the 
reporting system. 

The study of the Progress Report reproduced on pages 62-65 
will show that provisions have been made for separately reporting 
achievement with respect to class standards and with respect to the 
pupil’s own ability. Report cards used in secondary school usually 
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report only a single mark in a subject, one based on the pupil's 
achievement relative to the other pupils at the same grade level. 


Collecting the evidence on which to base the report 


Whatever form of reporting is used, the teacher must collect evi- 
dence on which to base the report. No system of reporting which 
depends entirely on the subjective impressions and memory of the 
teacher can be fair to the pupils. The teacher needs evidence in 
the form of records of the pupil's achievement and typical behavior. 
Such records are usually kept in the teacher's classbook or rollbook 
and/or in some kind of notebook with a separate page or section 
devoted to each pupil. Since the teacher must usually collect many 
kinds of evidence, and since some of the evidence must be recorded 
in the forms of anecdotes or comments, there is usually not sufficient 
room in the classbook alone to meet this need. Some teachers prefer 
to keep a folder for each pupil and record all information pertain- 
ing to the pupil in the folder. The use of a folder enables the 
teacher to preserve samples of the pupil’s work. These samples of 
work can be used to good advantage during a parent-teacher con- 
ference. 

Consideration of the objectives to be evaluated will determine the 
kinds of evidence to be collected (see Chapter 1 for an illustration 
for collecting evidence on “work habits.”) Only by deciding what 
kinds of evidence to collect, and by setting up a system for collect- 
ing and recording this evidence, can the teacher be in a position to 
adequately and fairly report on her pupil’s achievements and typical 
behaviors. 


Improving marking and grading practices 


From the preceding discussion it should be clear that the process 
of reporting is a complex one with many pitfalls for the unwary. 
Since most teachers are required to report their pupil's progress, 
the problem becomes one of improving marking and reporting pro- 
cedures. Usually such improvement can best be achieved by the 
cooperative efforts of teachers, administrators, and parents. 

The following criteria should be met in any system of marking 
and reporting. 

1. The system should communicate all of the important informa- 

tion about the pupil’s achievement and behavior to the parent. 
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2. The meanings of the symbols used and the basis for assigning 


such symbols should be clearly understood by both parent and 
teacher. 


3. All teachers of the same educational level in a pare 
school or school district should have the same philosophy 
marking. This is essential if marks are to have any meaning. 


4. The system of marking and reporting should not be so com- 
plicated as to place an undue burden on the teacher. 


Quantitative aspects of marking 


Assigning marks is essential 
tical process. However, 
assigning marks and in 
purposes, 

Consider that a s 
a seventh- 
fifteen of 
on the te: 
right (fif 


s r is- 
ly a philosophic rather than a stara 
most teachers use numbers in some AA 
arriving at summary grades for repor 


pelling test of twenty words has been given g 
grade class and that one of the pupils in the class spel leg 
the words correctly. What score should the student recaia 
st? One method of scoring would be to use the num A 
teen) and enter this number in the record book. TI 1 
kind of score is called a raw score. Another method of eor pE 
would be to use the percentage system and assign a score of F 
percent since each one of the twenty words represented 5 perce i 
of the total score. Still another method would be to assign a ere 
grade (eg., A, B, C, D, or F) to the test and record that grade i 
the record book. ate 
Which method should a teacher use? To answer this question t 5 
meaning of a test score must be considered. Actually what is m 
about this pupil's spelling ability from the fact that he got fl eo 
right out of the twenty words on the test? The answer is “almi ; 
nothing.” Suppose that the test were composed of twenty Ea 
words such as “cat” and “dog.” Missing even five of such easy woes 
would constitute very poor spelling ability for the seventh am 
and might even represent the performance of the poorest spei i 
in the class. On the other hand, if the test were composed o 
twenty very difficult words, fifteen right might represent the pr 
formance of the best speller in the class. Thus it is apparent ae 
raw scores or percentage scores by themselves are ee m $ n 
exception to this rule is found in tasks that are so TE mee ‘ at 
the raw scores do represent meaningful quantities by themselves. 
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Such scores as “60 words per minute—typing” and “10 seconds to 
run 100 yards” have common meaning throughout the country. 
Unlike these last two examples the meaning of “15 right out of 20” 
in spelling or 85 percent in arithmetic have no common meaning 
because of differences in the difficulty of material contained in 
various tests. 

_ Consider two pupils in the seventh grade. Johnny and Mary are 
in the same school but have different teachers. Although they study 
the same spelling words in both classes, Johnny’s teacher gives a 
test containing most of the easy words and few of the hard ones, 
while Mary’s teacher gives a test containing almost all hard words. 
Johnny receives a grade of 85 percent and Mary (who is actually 
a better speller than Johnny) receives a grade of 75 percent. Does 
this mean that Johnny actually spells better than Mary? Although 
this is probably the way in which Johnny's mother would interpret 
these two scores, it is obvious that these scores cannot be fairly 
compared. 

A better procedure than using raw scores or percentage scores is 
that of converting the raw scores into some kind of score which 
reflects the pupil's position in the class. Suppose that a teacher has 
given a twenty-item spelling test to her class and has counted and 
recorded the number right on each paper. She does not want to re- 
cord either a raw score or a percentage score, and decides to assign 
letter grades to the test papers. Some teachers convert percentage 
scores into letter scores by such definitions as “90 percent to 100 
percent equals A.” This type of conversion accomplishes nothing; 
since the basic objection to raw scores and percentage scores is not. 

„Eliminated. A better way to assign letter grades is to make a fre- 
quency distribution of the test scores and then assign grades by 
cutting the distribution at those points which will result in a fair 
distribution of grades for the particular test. Table 4 on page 70 
is an example of a frequency distribution with letter grades assigned 
to the raw scores. 

Many teachers employ plus and minus signs to provide a greater 
range of grades. Thus, instead of just marking a paper B, it be- 
comes possible to mark it B, B+, or B-. 

Another method of assigning marks is to use a nine-point numeri- 
cal system. In this system nine represents the highest mark, five the 
average mark, and one the lowest mark. These nine numbers corre- 
spond to letter grades as shown in Table 5 on page 70. 
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Table 4 Assigning Grades from a Frequency Distribution 


Score on 


Number Receiving Letter 
Test Score Grade 
20 
19 1 A 
18 1 
= 
17 2 
16 3 B 
15 2 
ou Z ~ 
13 4 
12 
5 C 
ll 2 
10 1 


fae nine-point system has the advantage that it does not us¢ 
p us or minus signs and that the marks recorded are numbers which 
can be averaged at the end of the semester. 


Table5 A Nine-point Marking System 


Number Letter equivalent 
9 A 
8 Between A and B 
7 B 
6 Between B and C 
5 Cc 
4 Between C and D 
3 D 
2 Between D and F 
1 F 


To use either the A, B, C, D, F system or the nine-point system; 
the procedure is as follows: 
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1. Assign a raw score to each paper. This may be the number of 
questions right on a test or any other raw score obtained by mark- 
ing a test, product, or procedure. 

2. Make a frequency distribution of the raw scores. This consists 

of listing each possible score on the test and then tallying the 

number of pupils who received each score. 

3. Determine which raw scores shall be equivalent to which letter 

grades or numerical grades. This determination is essentially a 

matter of teacher judgment rather than a statistical decision. 

Usually, more C’s will be given than any other grade unless the 

group is special in some way. There will usually be more B's 

than A’s and more D’s than F’s. When the group is atypical 

(either considerably better than average or considerably poorer 

than average) this rule will have to be modified. It is reasonable 

to expect particularly able classes to have a high percentage of 

A’s and B’s and particularly poor classes to have a high per- 

centage of C’s, D’s and F's. 

There is no statistically correct percentage of A’s, B’s, and so 
forth, that should be given by all teachers in all classes. Teachers 
sometimes mistakingly attempt to grade “on the curve,” that is, 
assign the same percentage of each letter grade in every class they 
teach. As was pointed out in the previous paragraph, the nature of 
the group must be considered in assigning grades so that no one 
System of percentages could possibly apply to all classes. There 
should, however, be as much agreement as possible between the 
grading policies of all the teachers in a school so that the kind of 
work that is marked “C” by one teacher will not be marked “A” by 


the teacher in the next room. 


Weighting test scores 


sts or other assignments are of equal impor- 
weighting the marks given on the 
rk on a five-minute quiz will 
he final mark as the score on a 


Usually not all te 
tance. Unless some system of y 
various assignments is used, the ma 
have as much weight in determining t 
major examination. The simplest method of weighting test scores 
is to record the score one time if it is of minor importance and to 
record it two or more times for more important assignments. Thus, 
a grade of B on a five-minute quiz might be recorded as a B, a grade 
of C on a mid-term examination might be recorded twice (C, C) 
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and the grade of B— on the final examination might be recorded 
four times, (B—, B—, B-, B-). 


Summarizing grades 


If the teacher has recorded grades for various tests, assign DE 
and other class projects, then at the end of the marking period s N 
will have the task of distilling from all the separate marks one som 
mary mark (in each subject) to use for reporting purposes. Sup 


ae 
pose that at the end of the semester the teacher has the following 
grades for a student in spelling: 


B, C, D, B, F, A, B,C, C 

uld this student receive in spelling? To a 
s that the grades be averaged in some way. ie 
8g grades is to assign some numerical equiva an 
add up the total points earned, and then dirog 
umber of scores which were included in the total. 
result of this procedure is technically known as the arithmetic mga 
Most people call it the “average,” but since there is more than Se 
kind of statistical “average,” it is necessary to be precise and refer 3 
this particular kind of an average as a “mean.” To obtain the mean 
of the above grades in spelling, it is first necessary to CEE c 
letter grades to numbers, In this case let A equal 5, B equal 4, 


equal 3, D equal 2, and F equal 1. The pupil's scores in spelling 
can now be represented as: 


4, 3,2, 4, 1, 5, 4, 3,3 D 
The total points earned by the student is 29, and since thats ar 
nine scores included in the total, the mean score is 3.2 (29 ae pa 
by 9). Since 3.2 is closer to 3 than to 4 the student receives a C i 
spelling on the basis of his mean score. ; a ae 
Another kind of statistical average is the median. The median > 
mply the middle score in the distribution when the scores a 
arranged in order from the highest to the lowest. If the pupi 
spelling scores are arranged in this fashion they become: 
A, B, B, B, C, C, C, D, F £ à 
Since there are nine separate scores, the fifth score (from ss 
end) is the median. In the example above, the median grade is C. 
In case there are an even number of scores, the median is considered 
to be midway between the two middle scores of the distribution. 
The median has two advantages over the mean. First, it is much 


What final grade sho 
this question require: 
method of averagin 
to each grade, 
total by the n 


si 


Ee — 


Exercises 73 


easier to compute, particularly when each pupil has a large number 
of Separate grades. Second, it is more representative of the pupil’s 
typical behavior, being unaffected by extremely high or low scores 
as is the mean. 

The majority of teachers employ the mean in summarizing grades, 
but there is no reason why the median cannot be employed for this 
purpose. Its use is especially recommended in those instances where 
a ae number of grades are to be averaged to arrive at a summary 
grade. 


EXERCISES 


l. Organize a panel and discuss the advantages and disadvantages 
of the A-B-C-D-F system of grading. 

2. Based on your own philosophy, write a description of an ideal 
grading system. 

3. Assign a committee to investigate different forms used locally 
for report cards and report on the merits of each. 

4. What are the arguments for and against grading on the basis of 
the child’s ability? Do these same arguments apply to a system of 
grading on individual pupil growth? 

5. Find the mean of the following scores: 9, 7, 10, 8, 10, 8, 9, 9, 
6, 8, 4, 10, 8. 

6. Find the median of the scores in Question 5. 

7. A social studies pupil was given the followin 
week: 5-minute quiz, C—; midterm exam, 
report, C. Should these marks receive equal weight, and if not, 
how would you weight them? 

8. Assume that as a teacher you are sc 
you would determine the dividing 
and between the C’s and D's. 


g marks during one 
B; 5-minute oral 


oring final exams; discuss how 
line between the C’s and B's 


" SUGGESTED ADDITIONAL READINGS 


Strang, R. Reporting to Parents. 

Thomas, R.M. Judging Student Progress. 

Thorndike, R.L., and Hagen, E. Measurement and Evaluation in 
Psychology and Education. 

Wrinkle, W.L. Improving Marking and Reporting Practices in 
Elementary and Secondary Schools. 


CHAPTER 7 


Evaluating Achievement 
with Standardized Tests 


A large number of aci 
mercial publishers. Th 
ized achievement tests, 


hievement tests are available from E 
ese tests are usually referred to as standar 3 
The word “standardized” is used to desig: 
nate the fact that the tests are administered, scored, meee 
preted in a standard way and that the tests are accompanied y 
“norms.” Norms are records of the performances made by prenh 
of individuals who have previously taken the test. They are a 
as a means of determining how the score of any individual w. 
takes the test compares with scores made by other persons. ed 
Standardized achievement tests differ from teacher-made arano 
ment tests in several ways. Since the standardized achievement ie 
is designed for use in many different school systems throughout S 
country, the content must be based on broad objectives ee. 
many different courses of study in the area being tested. ‘Teac i 
made achievement tests, on the other hand, are designed to measur 
the attainment of objectives in a specific class at a particular Sas 
Standardized achievement tests are usually more expertly designe 
and constructed than the typical teacher-constructed exam. The 
financial returns accruing from the sale of tests enable the test- 
maker to spend considerable time and money in item writing, in 
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item selection, and on various kinds of research necessary to im- 
prove the test. 

Although teacher-made achievement tests are more appropriate 
for evaluating specific learning within a given class, standardized 
achievement tests can be helpful in providing information regard- 
ing pupils’ status and progress in broad educational objectives. 


Tests of basic skills 


Among the most useful standardized achievement tests are those 
in the area of basic skills, such as reading, language, and arithmetic. 
Since many different schools have similar objectives and teach simi- 
lar materials in the basic skills areas, standardized achievement tests 
in these areas are widely used. However, in spite of the basic agree- 
ment on objectives in the areas, the contents of the various stand- 
ardized achievement tests in these areas vary widely among them- 
selves. The only method of determining whether a particular stand- 
ardized test in any one of the basic skills is suitable for a particular 
school or school system is to examine the test and compare 1t with 


the local objectives. 


Tests in content areas 


nt tests which cover the content of non- 
es and science. However, there is great 
d in various classes and school sys- 
de standardized achievement 


There are many achieveme 
skill areas, such as social studi 
variation in the content covere 


tems in these areas. This fact has ma d y 
tests of content unusable in many situations. Recognizing this great 


variability in content, test-makers have devised tests which measure 
pupils’ ability to apply school learnings to the solution of new prob- 
lems. Two tests of this type are described in the next section. This 


approach is becoming increasingly popular because there is more 
agreement on the over-all broad objectives of education in these 
areas than there is on the content of any one course which con- 


tributes to the attainment of the over-all objectives. 


Tests of broad educational objectives 


One of the first and most widely used tests of broad educational 
objectives is the Jowa Tests of Educational Development (Science 
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Research Associates). The tests are devised to yield evidence of ae 
degree to which concepts are understood other than the ree 
which isolated facts are recalled. The series is designed for grades 
8.5 to 13.5 and yields ten scores: understanding of basic social or 
cepts, general background in the natural riences Core ta 
appropriateness of expression, ability to do quantitative thin i > 
interpretation of reading materials in the social studies, interpr 
tation of reading materials in the natural sciences, interpre a 
of literary materials, general vocabulary, the subtotal of these eight 
tests, and using sources of information. The total series of tests 
requires approximately eight hours of testing time. 

The latest and most ambitious series of tests designed to me 
broad educational objectives is the Sequential Tests of Educationa 
Progress (Cooperative Test Division, Education Testing Service) - 
This series consists of tests of: essay writing, listening COP Fae 
sion, reading comprehension, writing, science, mathematics, an 
social studies. The tests emphasize measurement of the ability hg. 
apply school-learned skills in solving new problems.’ Each of the 


tests is available for four different levels—grades 4-6, 6-9, 10-12, 


13-14. The total series of tests requires approximately seven and 
one-half hours of testing time. 


Achievement test series 


Most of the major publishers publish achievement tests eae 
which include achievement tests in several areas and covering ay 
eral different educational levels. Typical achievement test series 
are listed in Table 6 on page 78. These series have norms based m 
the same or on comparable populations. This fact enables me fa 
user to compare the performance of a pupil or a class in onca et 
such as reading, with the performance in any other subject covere 
by the test, e.g., arithmetic. 


Diagnostic tests 


Most of the tests referred to in the preceding sections have heen 
“survey” tests—that is, they cover a broad area and result in a tota 
score which reflects over-all achievement in the area tested. Thus 


*See Chapter 2 for typical items from this series. 
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teachers can say that a pupil is doing well in arithmetic or doing 
poorly in arithmetic; but they do not know why, nor do they know 
what arithmetic concepts are causing difficulty. There are tests 
Which have been devised to provide information about the specific 

| nature of a pupil's difficulties in given subject areas. These tests 
are called diagnostic tests. A diagnostic test in division of whole 

| numbers may reveal that the pupil is having trouble with problems 

} involving zero but that otherwise he can perform the division 
process adequately. There are a number of good diagnostic tests 
on the market, particularly for reading and arithmetic. Any test 
can be used as a diagnostic test in a limited way by examining the 
students’ performance on the individual items which make up the 
test rather than on the test as a whole. However, the typical survey 
test does not include sufficient items in any one area to enable the 
test to be used successfully as a diagnostic instrument. 


Test norms 


Standardized tests are given under “standard conditions,” which 


means that the same set of directions is given to all students who 

take the test and that the same time limits are imposed. Therefore, 

the performance of individuals taking the test at widely different 

times and places can be compared. To provide a method whereby 

an individual’s performance on a standardized test can be compared 

i with that of other individuals, the standardized test usually fur- 
nishes “norms.” The word norm means average. The “norms” on 

i a test are the averages of various groups who have taken the test. 
By using different kinds of norms, such statements can be made 
as, “Johnny reads at the 5.8 grade level” (reads as well as the aver- 
age child in the eighth month of the fifth grade), and “Jane reads 
at the 70th percentile for ninth grade students” (reads better than 
70 percent of the children in the ninth grade). The statement 
about Johnny's reading ability was based on “grade norms,” while 
the statement about Jane's reading ability was based on “percentile 
i norm.” Although grade norms are widely used, particularly at the 
| elementary school level, percentile norms are generally more useful. 
; The norms furnished with a test are valuable only if a comparison 
is to be made with pupils from other schools. Since norms are 

based on an average of various regions of the country, large and 

small schools, pupils of various background, and schools with vary- 


o 


84 


Table 6 Representative Achievement Test Series 


Test and Publisher 


Grade Levels 


Tests Included 


California Achievement Tests, 1957 
Edition 
(California Test Bureau) 


Coordinated Scales of Attainment 
(Educational Test Bureau) 


Essential High School Content Battery 
(World Book Company) 


Iowa Every-Pupil Tests of Basic Skills 
(Houghton Mifflin Company) 


Iowa Tests of Educational 
Development 
(Science Research Associates) 


Lower Primary (1-2) 
Upper Primary (3-L4) 
Elementary (4-6) 
Junior High (7-9) 
Advanced (9-14) 


1-8 (Separate form for 
each grade level) 


9-13 
Elementary (3-5) 
Advanced (5-9) 


8.5-13.5 


Reading, Arithmetic, language. 


Reading, arithmetic, and spelling in the forms for grades 
1, 2, and 3. Language, history, geography, science, and 
literature are added for grades 4 and above. 


Mathematics, science, social studies, English 


Silent reading comprehension, work-study skills, basic 
language skills, basic arithmetic skills. 


Understanding of basic social concepts; general back- 
ground in the natural sciences; correctness and appro- 
priateness of expression; ability to do quantitative 
thinking; interpretation of reading materials in the 
social studies; interpretation of reading materials in the 
natural sciences; interpretation of literary materials; 
general vocabulary; using sources of information. 
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Metropolitan Achievement Tests 
(World Book Company) 


Sequential Tests of Educational 
Progress 

(Cooperative Test Division—Educa- 
tional Testing Service) 


SRA Achievement Series 
(Science Research Associates) 


Stanford Achievement Tests 
(World Book Company) 


Primary I (1) 
Primary II (2) 
Elementary (3-4) 
Intermediate (5-7.5) 
Advanced (7-9.5) 


Level 4 (4-6) 
Level 3 (7-9) 
Level 2 (10-12) 
Level 1 (13-14) 


Grades 2-4 
Grades 4-6 
Grades 6-9 


Primary (1.9-3.5) 
Elementary (3.0-4.9) 
Intermediate (5-6) 
Advanced (7-9) 


Reading, vocabulary, and arithmetic included at all 
levels. Spelling for grade 2 and above. Language added 
for grade 3 and above. Geography, history, and science 
added for grade 5 and above. 


Essay writing, listening comprehension, reading com- 
prehension, writing, science, mathematics, social studies. 


Reading, arithmetic, and language arts included at all 
levels, Language perception added to grades 2-4, 
Work-study skills added for grades 4-6 and 6-9. 


Reading, spelling, and arithmetic included at all levels. 
Language added for elementary battery and above. 


Social studies, science, and study skills added in the 
intermediate and advanced batteries. 
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ing educational objectives, it is very difficult to determine whether 
the particular norms furnished with a test are ones which can 
reasonably be employed. The words “national norm” are often 
heard in connection with standardized tests. The words ao mis- 
leading to the extent that there is no one “national norm” = 
ployed by all makers of standardized tests. The norms furnished 
with the various achievement test series all differ from each other. 
Sometimes the differences are quite large. The only way of deter- 
mining whether the norms supplied with a particular test are appro- 


priate for use in a given situation is to study the information con- 
tained in the manual. 


If the content of an achieve 
objectives, but the norms fu. 
priate, local norms may be 
the test scores of pupils in 
These local norms provide 
of pupils within the local s 
ing new students coming i 


ment test is useful for measuring local 
tnished with the test are not appro- 
developed. These are norms based on 
one particular school or school system. 
a means of comparing the achievement 
ystem with each other as well as compar- 
nto the school with those already there. 


Test profiles 


A test profile is a graphical representation of the scores on various 
tests given to an individual. By graphically representing these 
scores, it is possible to identify strengths and weaknesses of me 
individual student. That is, inspection of the profile may A 
the fact that an individual is strong in reading but weak in a z 
metic. Before test profiles can be plotted, it is necessary for the 
norms for the various tests to be comparable. Unless the norm 
populations are comparable, preferably based on the Sanci pO i 
lation, it is not valid to compare an individual’s standing on one 
test with that on a different test. The meaning of “stands at the 
70th percentile in reading” is quite different when the norm E a 
consists of college preparatory high school students in the gao 
grade than when it consists of all high school students in the 
eleventh grade. a 

Profiles have traditionally been plotted by depicting each of the 
Separate test scores as a point. This procedure overemphasizes small 
differences between achievement in the different areas. A better 
procedure is to represent each of the tests by a shaded band extend- 


Cooperative School and College Ability Tests 


C BEIOME 


kat —— 


Intorpratationt Scores profiled here are bands rather If the bands of the student’s verbal and quantitative 
than points. The midpoint of each band shows ap- scores overlap, there is probably no important differ- 
proximately what percentage of students in the norm- ence between the scores. If the two bands do not over: 
ing group earned scores lower than the one profiled. lap, the chances are about 5-to-1 that there is a real 
Each band covers two standard errors of measure. difference in measured ability, present. (See 

ment, one above and one below the percentile rank for additional information on interpretation.) 

score earned, This means that the chances are 2-to-1 

that the student's “true” score les within the range 
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ing one standard error of measurement? above and below the 
obtained score. This band serves as a reminder that test scores 
are not precise and prevents the overemphasis of small differences 
in achievement which may have no educational significance. The 
tests profiled on page 81° illustrate this procedure. 


EXERCISES 


Inspect the items on an achievement test in an area of your 


choice, and attempt to list the general objectives of the test- 
maker. 


Outline the objectives of some course you are teaching or expect 
to teach in the future. Obtain a standardized achievement test 
whose title would lead you to think it might be useful in evalu- 


ating these objectives, Compare your objectives with the content 
of the test. 


In the manual of the test you used in Exercise 2, turn to the 
section which describes the sample on which the norms were 
based. Is there sufficient description of the sample to enable you 
to decide whether this is the type of group with which you want 
to compare your students? 


Compare two well-known standardized achievement tests in your 
area to see how they differ. Identify, if you can, apparent differ- 
ences the test-makers had in mind when they wrote the test. 


Interview a counselor in a school which uses a test such as the 
Towa Test of Educational Development to determine such things 
as, how often it is given, who takes the test, how are the results 
used, and how is the school organized to administer the test. 


6. How is it possible for a child in the fifth grade to place at the 
eighth-grade level on a standardized arithmetic test, without 
being able to work all the types of arithmetic problems taught 
at the eighth-grade level? 


? For an explanation of the standard error of measurement see Appendix B. 
page 109. : 

iner’: N hool and College 

3 Reproduced from Examiner’s Manual for Cooperative Sc r g 

Ability Tests by permission of the Cooperative Test Division, Educational Test- 
ing Service, Princeton, New Jersey, and Los Angeles, Calif. 
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CHAPTER 8 


Evaluating Abilities with 
Standardized Tests 


Closely related to achievement tests is a group of tests known 
as intelligence tests or tests of scholastic aptitude. These tests a 
designed to measure the pupil’s capacity to profit from schoo 
instruction, 

Achievement, intelligence, and aptitude tests are closely related 
to each other. Although the three categories traditionally 
have been thought of as separate and distinct, the modern view 15 
to minimize differences between the labels of tests in any of these 
areas, and to select and use the tests on the basis of their validity 
for the particular job to be done. In a recent article Alexander G. 
Wesman’ points out the overlap between these three “types” of tests. 

“By definition, an achievement test measures what the exam- 
inee has learned. But an intelligence test measures what the 
examinee has learned. And an aptitude test measures what the 
examinee has learned. So far, no difference is revealed. Yet three 
of the traditional categories into which tests are classified are 
intelligence, aptitude, and achievement. Now these categories are 
very handy; they permit publishers to divide their catalogs into 
logical segments, and provide textbook authors with convenient 


* Test Service Bulletin #51 (New York: The Psychological Corporation, De- 
cember, 1956) . 
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ene, Unfortunately, the categories represent so 
een ae ification as to cause confusion as to what is being 
ante o at all three kinds of tests measure is what the 
ject has learned. The ability to answer a proverbs item is no 
a, 4 part of the examinee’s heredity than is the ability to 
= pond to an item in a mechanical comprehension test or in a 
cial studies test. All are learned behavior. 

Moreover, all are intelligent behavior. It takes intelligence to 
supply the missing number in a number series problem. It also 
Tequires intelligence to figure out which pulley will be most 
efficient, or to remember which president proposed an inter- 
American doctrine. We can say, then, that an intelligence test 
measures intelligent behavior, an aptitude test measures intelli- 
gent behavior and an achievement test measures intelligent 
behavior. 

Finally, all three types of tests measure probability of future 
learning or performance, which is what we generally mean when 
we speak of “aptitude.” In business and industry, the chances 
that an employee will profit from training or will perform new 
duties capably may be predicted by scores on an intelligence 
test, by scores on one or more specific aptitude tests, or by some 
measure of the degree of skill the employee already possesses. 
Similarly, test users in the schools know that an intelligence 
test is usually a good instrument for predicting English grades, 
a social studies test is often helpful for prediction of future 
grades in social studies, and a mechanical comprehension test is 
likely to be useful in predicting for scientific or technical courses. 
So, intelligence tests are aptitude tests, achievement tests are 


aptitude tests, and aptitude tests are aptitude tests.” 


What intelligence tests measure 
primarily an individual’s ability to 
learn the materials taught in school. This was their original pur- 
pose and continues to be their main function, although they are 
also used for other purposes. It is widely thought that intelligence, 
as measured by intelligence tests, does not depend on the extent ofa 
person’s schooling or other experiences. This is a complete miscon- 
ception. Thus, identical twins exposed to completely different 
environments, one rich in opportunities for intellectual stimulation, 


Intelligence tests evaluate 
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and the other devoid of such opportunities, can be expected to vary 
considerably in their scores on an intelligence test. The one with 
the richer environment will test significantly higher on a typical 
intelligence test than his less fortunate twin. This fact must always 
be kept in mind in interpreting the results of intelligence test scores. 

Intelligence tests are designed to predict the ability of pupils to 
profit from school instruction, and for this purpose they are fairly 
competent, if school instruction is thought of as limited to the 
academic areas. However, they are not good predictors of how well 
a student may perform in non-academic areas such as woodworking, 
art, and music. Neither are they an indication of a person's over- 
all worth. Work habits, study habits, perseverence, integrity, and 
many other important facets of the individual’s personality are not 
revealed by the I.Q. As a result, it is not uncommon to find a pupil 
achieving more than his classmate, even though his I.Q. is not as 
high as his classmate’s. The intelligence test indicates how well the 
child may achieve in school; it does not guarantee that performance. 


Types of intelligence tests 


The earliest intelligence tests were devised to be administered to 
one person at a time. This type of test is called an individual 
intelligence test. After intelligence tests came to be generally 
accepted, group tests of intelligence were devised, which resemble 
the standardized achievement test and can be administered to more 
than one pupil at a time. Because of ease of administration and 
the lower cost, most of the intelligence tests given in our schools 
are group tests rather than individual intelligence tests. 

Group intelligence tests depend on reading ability; accordingly, 
pupils who are poor readers will obtain a lower score on a group 
intelligence test than other pupils of equal inherited ability who are 
better readers. There are, however, tests of intelligence that are com- 
pletely non-verbal. These tests consist of performance items and are 
called performance or non-language tests of intelligence. This type 
of test is not as good an indicator of the pupil’s ability to do school 
work as the tests of intelligence which involve the use of language. 
However, they provide a useful means of obtaining evidence about 
the intelligence of pupils who have a language problem, either 


because they are poor readers or because English is not their 
native tongue. 
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5 E peip ae Merne tests depend heavily upon language, 
m ae Apopi scores very low on a group intelligence test fur- 
T a s ould be obtained to shed light upon the question 
eee he low score really reflects a lack of basic ability to do 
ieee ue whether some other factor was operating to lower 
Pee iE is further evidence can consist of an individual intel- 
st, a non-language test of intelligence, or the pupil’s actual 
performance in the classroom. 
beatae pigs tests yield more than one I.Q.—for iy a 
ee Q. a a non-verbal LQ. When such a test is used, large 
i nces etween the two I.Q.’s can reveal the existence of a 
guage handicap. 


Numerical expression of intelligence 

score which represents the 
hich was used. The term 
e Stanford-Binet test of 


i The exact meaning of a numerical 

LS depends upon the particular test w 

.Q. was first used in connection with th 

‘ i Mental Age 

intelligence. PE 00: T 
gence. It was defined as Chronological Age x 100. hus, 


ifa pupil had a mental age of 120 months, as measured by the test, 


120 
and a chronological age of 80 months, his I. Q. would bey X 
Whenever the mental age is exactly equal 
to the chronological age, the 1.Q. is 100. Because the term “I.Q.” 
wed by other test-makers to use in 


Was so well known, it was borro 
ence tests. However, the 1.Q.’s 


, Teporting scores on their intellig 
obtained from many of these other tests are not derived by dividing 


a mental age by a chronological age- 
In spite of the fact that numerical scores derived from different 


tests of intelligence are not exactly comparable, they all have a 
certain similarity. This fact enables the test-user to arrive at a crude 
estimate of a pupil’s ability to do school work, even though the 
exact meaning of the test score may not be known. On all tests 
which express intelligence in terms of an I.Q., the average score 
is approximately 100. However, since the scores are crude measures, 
a better way of thinking of the average is to consider any score in 
the range from approximately 90 to 110 to be average. No emphasis 
should be placed on the exact score earned by an individual, rather 


100 which equals 150. 
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the score should be thought of as reflecting a general level of 
capacity to profit from school instruction. 


Homogeneous grouping 


Intelligence test results have sometimes been used to create classes 
whose members are similar in intelligence. For example, suppose 
that there were five fourth-grade classes in an elementary school. 
Strict homogeneous grouping by intelligence would require deter- 
mining the I.Q. of each pupil in the fourth grade and then dividing 
the pupils into five groups, assigning the top fifth of the pupils to 
one class, the next fifth to another class, and so forth. 

The purpose of grouping pupils in this fashion is to reduce the 
variability of ability in each of the classes, and thus make instruc- 
tion and learning both easier and more effective. Current theory 
rejects the idea of homogeneous grouping by intelligence except for 
those pupils who are either so markedly above or below average in 
intelligence that there is a good indication that they probably will 
not profit from instruction in a regular classroom: 

Since pupils of the same LQ. can differ considerably in their 
abilities in different subjects, a better method of homogeneous 
grouping is to group together for instructional purposes pupils who 
are at the same level of development in the subject being taught. 


This is what an elementary teacher does when she divides her 
class into reading groups. 


Using intelligence tests scores 


The intelligence test provides the teacher with a means of deter- 
mining the over-all ability of a pupil to succeed at the more pies 
demic types of school work. However, the following cautions 
should be applied in dealing with these test scores. 

l. Since there is so much misunderstanding of the 1.Q., even by 

experts, refrain from attaching an I.Q. to a pupil in discussions 

with lay people; and even among members of the profession a 

description in behavioral terms of pupil ability is preferred. 

2. The IQ. is not a precise measurement, and should be used 

only to indicate the general level of ability to do school work. $ 

3. The 1.Q. does not necessarily indicate ability to succeed in 

the less academic types of school work. The pupil with a high 
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LQ. may not be as outstanding in art, music, physical education, 
work involving manual dexterity, or other areas of school work 
which are not highly academic in nature, as he is in the more 
academic subjects. 
4. The results of intelligence tests given more than two or three 
years previously should not be relied on too heavily. Since meas- 
ured I.Q. depends on the pupil’s experiences, there may be a 
} considerable shift in measured I.Q. in that time. 
5. Personality and character are not completely reflected by the 
I.Q. There are many worth-while persons in our society who have 
relatively low I.Q.’s, and many persons in prisons who have high 
1.Q.’s. 
6. Two persons with the 
strengths and weakness, 
usually include verbal items (su 
tive items (such as arithmetic) . It is possible for two individuals 
to achieve the same total I.Q. score with one individual much 
stronger in the verbal area than in the quantitative area, and the 
other individual much stronger in quantitative than in verbal. 
7. If other available evidence is not consistent with a pupil's 
score on an intelligence test, investigate to discover the reason for 
the inconsistency. It may be that the 1.Q. test given to the child 


was, for some reason, not valid. 


same I.Q. may have entirely different 
even in intellectual areas. I.Q. tests 
ch as vocabulary) and quantita- 


Tests of special aptitudes 


Although the 1.Q. is a measure of the general aptitude of the 
student to profit from school instruction, some areas of school 
“instruction are more closely related to the I.Q. than are others. 
This fact has led to the development of tests designed to predict 
success in some particular aspect of the curriculum with more 
validity than that achieved by using the general intelligence test. 
Special aptitude tests have been developed for art, music, algebra, 
foreign language, shorthand, reading, and other school subjects. 
The most widely used prognostic tests are those of “reading readi- 
ness.” These tests are designed to be given to students to determine 
whether they are sufficiently mature to profit from formal instruc- 


tion in reading. 
In recent years a num 
on the market. The batterie 


ber of aptitude test batteries have appeared 
s consist of a number of tests of differ- 
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Table 7 Representative Standardized Tests of Ability 


Test and Publisher 


Grade or Age Levels 


Type 


Scores Reported 


California Test of Mental Maturity 
1957 Edition 
(California Test Bureau) 


Chicago Tests of Primary Mental 
Abilities 
(Science Research Associates) 


Differential Aptitude Test Battery 
(Psychological Corporation) 


Kuhlman-Anderson Intelligence Test— 
Sixth Edition 
(Personnel Press, Inc.) 


Otis Quick-Scoring Mental Ability 
Tests: New Edition 
(World Book Company) 


Grades: Kgn.-1; 1-3; 
4-8; 7-9; 9-13; 10- 
College; Adult 


Ages 11-17 


Grades 8-12 


Grades: Kgn., 1, 2, 3, 
4, 5, 6, 7-8, 9-12 


Grades 1-4; 4-9; High 
School and College 


Group 


Group 


Group 


Group 


Group 


Language I.Q., non-language I.Q. 


Total I.Q. 


Number, verbal meaning, space, word 
fluency, reasoning, memory 


Verbal reasoning, numerical ability, 
abstract reasoning, space relations, 
mechanical reasoning, clerical speed 
and accuracy, language usage 


Total 1.Q. 


Total I.Q. 


16 


Pintner General Ability Tests—Non- 
Language Series 
(World Book Company) 


Pintner General Ability Tests—Verbal 


Series 
(World Book Company) 


Revised Stanford-Binet Scale— 
1937 Edition 
(Houghton-Mifflin) 


School and College Ability Tests 
(Cooperative Test Division, Educational 
Testing Service) 


Terman-McNemar Test of Mental 
Ability 
(World Book Company) 


Wechsler Adult Intelligence Scale 
(Psychological Corporation) 


Wechsler Intelligence Scale for Children 
(Psychological Corporation) 


Grades 4-9 


Grades: Kgn—-2; 2.5- 


4.5; 4.5-9.5; 9 and 
above 


Ages 2-adult 


Grades 4-6; 6-8; 8-10; 
10-12; 13-14 


Grades 7-12 


Ages 16 and up 


Ages 5-15 


Group 


Individual 


Group 


Group 


Individual 


Individual 


Non-language I.Q. 


Total LQ. 


Total 1.Q. 
Verbal, quantitative, total 


Total 1.Q. 
Verbal 1.Q., performance 1.Q., total 
IQ. 


Verbal I.Q., performance I.Q., total 
LQ. 
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ent aptitudes designed to provide a profile of the pupil's strengths 
and weaknesses within the intellectual domain. Their main advan; 
tage is that they provide norms for the separate tests in the bara 
based on the same population. This permits the comparison of a 
relative standing of an individual in the various areas covered yi 
the tests. Such comparisons cannot be made with tests whose norms 
have been developed on different populations. The major problem 
is that of determining the validity of each separate score or com- 


bination of scores for each purpose for which the battery is to 
be used. 


Persons contemplating the use of special aptitude tests or aptitude 
batteries should carefully examine the evidence about their validity 


. 4 . t 
to determine whether the test or battery is actually an improvemen 
over a measure of general intelligence. 


EXERCISES 
1. List several reasons wh 
his I.Q. would indicate 


Obtain a group test of intelligence. Examine the items in the 

test to determine what percent of them can be classified as 

achievement items. 

3. John has an IQ. of 115, but is failing in Latin. His teacher 
has tried to help him but cannot succeed. The school counselor 
says that there must be something wrong with the teacher's 
methods since anyone with an I.Q. of 115 must be able to learn 
Latin. Do you think the counselor is right? Why? 

4. Ifa teacher wanted to 

metic, 


y a pupil might be achieving higher than 


group her class for instruction in and 

which of the following instruments would be most like y 
to produce groups alike in their present standing in arithmetic: 
an individual intelligence test; a group non-verbal intelligence 
test; a group arithmetic achievement test? Why? 

5. When Mary was 7 years old her I.Q. was 102: when she was 10 
years old her I.Q. was 89 and when she was 13 years old her 1.Q. 
was 110. How can the differences between these three 1.Q.’s be 
accounted for? What 1.Q. might be obtained if Mary were to 
be tested when she was 15 years old? 


6. Obtain from school records the 1.Q’'s of several pupils ue have 
been given two or more tests of intelligence. Compare the differ- 
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ent 1.Q.’s for each pupil to determine how constant their 1.Q.’s 
were as measured by intelligence tests. 


SUGGESTED ADDITIONAL READINGS 


Anastasi, A. Psychological Testing. 

Cronbach, L.J. Essentials of Psychological Testing. 

Freeman, F.S. Theory and Practice of Psychological Testing. 

Goodenough, F.L. Mental Testing. 

Thorndike, R.L., and Hagen, E. Measurement and Evaluation in 
Psychology and Education. ‘ 


CHAPTER 9 


Evaluating Interest and Adjustment 
with Standardized Instruments 


In working With pupils it is often helpful to know something 
about their interests and adjustment. Observations and inter- 
views, both formal and informal, are the traditional techniques for 
collecting this type of data. As an adjunct to observation and inter- 
view, paper-and-pencil instruments have been developed to obtain 
information about the pupil’s interests and adjustment in a more 
economical way. These paper-and-pencil instruments consist of 
standard sets of questions administered and scored in a standardized 
way. The fact that they resemble standardized tests so closely has 
led some persons to mistakenly classify them as standardized tests, 
whereas the correct name for instruments of this type is question- 
naire or inventory. A test consists of questions which have right or 
wrong answers. Inventories consist of questions which have no right 
answers, but which merely report the student’s feelings, preferences, 
or actions. 

The validity of all standardized inventories or questionnaires 
depends on the pupil’s willingness to reveal himself frankly and his 
ability to see himself as he is. For this reason self-report inventories 
have their greatest use in situations where the individual has no 
Teason to fake his answers, since all self-report inventories are 
fakable to some degree. When used by persons who are aware of 
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their limitations, standardized inventories can provide useful facts 
about pupils. 


Interest inventories 


Interest inventories are designed to investigate the pupil's inter- 
ests in and feeling toward occupations and professions and various 
activities. This evidence then becomes useful in vocational coun- 
seling. It can also be an aid in determining what kinds of class- 
room activities and educational experiences will interest the pupil. 
The major types of interest inventories on the market are illustrated 
by the Strong Vocational Interest Blank, the Kuder Preference 
Record (Vocational), and What I Like to Do. 

The Strong Vocational Interest Blank for Men (Stanford Uni- 
versity Press) is one of the best-known instruments for appraising 
interests. Consisting of over 400 items, the inventory can be scored 
for forty-seven occupations, and for six groups of occupations. A 
separate form of the Strong is available for women and includes 
scoring for twenty-eight occupations. The Strong is most useful 
when a pupil has expressed the desire to see whether his interests 
coincide with the interests of persons in one of the forty-seven 
occupations for which scoring keys are available. 

The Kuder Preference Record-Vocational (Science Research 
Associates) has several different forms. Form B measures nine 
interest areas: mechanical, computational, scientific, persuasive, 
artistic, literary, musical, social service, and clerical. Form C in- 
cludes all of these scales plus an outdoor scale and a verification 
score (used to identify those who have not followed directions or 
who have answered carelessly). These two forms are most useful 
in identifying general occupational areas for further study and 
exploration for pupils who have no definite occupational plans at 
the time they take the inventory. A new form of the Kuder (Form 
D), now available, measures the individual’s interest in specific 
occupations, as does the Strong. At present scoring keys are avail- 
able for twelve specific occupations. 

Both the Strong and the Kuder are used with high school stu- 
dents, college students, and adults. 

A new interest inventory for grades 4 through 7 is the inventory 
What I Like To Do (Science Research Associates). This interest 
inventory measures interests in eight areas: art, music, social studies, 
active play, quiet play, manual arts, home arts, and science. The 
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inventory is designed to help teachers in planning suitable intl 
tional activities and in guiding and counseling individual pupus. 
Interest should not be confused with ability. A pupil may be 
interested in an activity or an occupation but have little ability in 
that activity or occupation. Counselors or teachers should always 
consider both ability and interest when counseling pupils. 


Adjustment inventories 

Adjustment is defined as the degree of ability to fit into and live 
happily in one’s environment. Numerous commercial instruments 
attempt to measure adjustment. However, this type of inventory 
must be used very cautiously, and at best it gives only clues or 
indications of problem areas, 

Typical inventories in this area are the SRA Youth Inventory, 
The Bell Adjustment Inventory, and the Minnesota Multiphasic 
Personality Inventory (MMPI). 

The'SRA Youth Inventory (Science Research Associates) peli 
identify problems that young people worry about most. The eig 
areas covered are: My School, Looking Ahead, About Myself, Ce 
ting Along with Others, My Home and Family, Boy Meets Girl, 


Health, and Things in General. This inventory is designed for 
grades 7 through 12. 


The Bell Adjustment 


. . D: S 
Inventory (Stanford University Pres ) 
yields scores in four areas 


—home, health, social, and emotional. It 
also helps identify problems of concern to young people. ee 

The Minnesota Multiphasic Personality Inventory (Psycho E 
cal Corporation) is designed to identify a number of distinct ioe 
gories of abnormal behavior. It has ten scales: hate a Sa 
Depression, Hysteria, Psychopathic Deviate, Masculinity and = 
ininity, Paranoia, Psychasthenia, Schizophrenia, Hypomania, a 
Social Introversion. The MMPI represents a type of inventory 
which should be used only by persons with advanced training e 
psychology. It is mentioned here merely to indicate pa ke 
ments of this type exist and to point out the fact that neh inst ‘ 
ments should not be employed by classroom teachers or others no 
especially qualified in their use. 


Summary evaluation of interest and adjustment inventories 


Self-report inventories of interest and pea Pamay 
instruments to be used in understanding and couns g $ 
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although interest inventories do have some use in planning class- 
room activities. Both types of inventories are subject to the limi- 
tations that the pupil must (1) be willing to give truthful answers 
and (2) have self-insight and self-understanding. 

Interest inventories are widely used in secondary schools and 
colleges. When used together with measures of ability and other 
information about the pupil, they provide valuable material for 
use in vocational counseling. 

Adjustment inventories have not been adopted as widely as inter- 
est inventories, nor is there as much evidence about their validity. 
Some authorities in the field of measurement and evaluation go so 
far as to recommend that they not be used at all. While this recom- 
mendation may be somewhat too harsh, it does seem clear that 
instruments of this type cannot be used by persons without consid- 
erable special training, and probably should not be used by class- 


room teachers. 


EXERCISES 


l. Obtain a copy of the Strong Vocational Interest Blank and the 
Kuder Preference Record-Vocational, and compare the types of 
items, 

9, List and discuss some precautions that should be taken when 
selecting , administering and interpreting the results of an inven- 
tory which is known to be “fakable.” 

3. Locate a research study which used an adjustment inventory. 
Note the way the instrument was employed and the precautions 
taken in interpreting the findings. 


SUGGESTED ADDITIONAL READINGS 
Ferguson, L.W. Personality Measurement. 
Thorndike, R.L., and Hagen, E. Measurement and Evaluation in 


Psychology and Education. 
Vernon, P.E., Personality Tests and Assessments. 


CHAPTER 10 


How to Select a Standardized 
Test or Inventory 


Basically the problem of buying a test is the same as that involved 


in buying any other item. The first step is to determine the func- 
tions or purposes to be served by the test or inventory to be pur 
chased. Next, a number of tests or inventories which seem to be 
potentially useful for these purposes must be located. Finally, 


the one test or inventory which best seems to fit the purposes 18 
selected. 


Determining the purposes of the test or inventory 


Unless the purposes to be served by a test or inventory can be 
Stated in concrete terms, there is probably no point in its pute 
chase. Therefore, the first step should be to write down the pur- 
pose or purposes to be served by the test or inventory. These 
should be as explicit as possible. If an achievement test is needed 
to evaluate learnings in some area, the specific learnings which the 
test is expected to cover should be listed. If an interest inventory 
is being considered for use in a guidance program, a list should be 
made of the exact ways in which the inventory is to be used, 
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Locating likely tests or inventories 


After the objectives have been clearly determined, the next step 
is to locate tests which potentially are useful in fulfilling the objec- 
tives. The best single source of information about tests and inven- 
tories is the Fourth Mental Measurements Yearbook?’ This book 
contains descriptions of hundreds of published tests as well as 
critical reviews of many of them. No search for likely tests is com- 
plete until this book has been consulted. Since new tests continually 
become available and the Mental Measurement Yearbooks are pub- 
lished at irregular intervals, there may be some new tests or inven- 
tories which are not yet listed in this source. To locate these new 
tests a search should be made in the catalogs of test publishers. 
These catalogs are the most complete list of the available tests and 
inventories. A list of publishers who issue catalogs is included at 


the end of this chapter. 


Obtaining specimen copies of the most likely 
tests or inventories 


From the descriptions and reviews in the Mental Measurements 
Yearbook and from description contained in the publishers catalogs, 
the names of those tests or inventories which seem most likely to ful- 
fill the objectives are determined. Next, specimen sets of these tests 
or inventories are ordered from the test publishers. A specimen set 
consists of a copy of the test or inventory, an answer key, a manual, 
and all other material usually supplied with the test or inventory. 
It can be purchased for a nominal sum (usually 50 cents or less) . 
Before ordering the specimen set, check the restrictions contained in 


the test catalog on who may order test materials to be sure that the 
order will be filled. Usually school administrators or college teach- 


ers can authorize the purchase of specimen sets. 


Evaluating the specimen copies 


The major consideration in selecting a test is its validity, the 
extent to which it measures what the user wants it to measure. In 
selecting an achievement test its validity can be determined by 


1 Buros, O. K. (Editor.) The Fourth Mental Measurements Yearbook. 
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analyzing the test, item by item, to determine if the individual 
items measure the specified objectives. A simple method of ae 
this is to compile a list of objectives and classify each item in the 
test according to which objective or objectives it measures. In set: 
ting up the list of objectives include a Classification of “Irrelevant 
since some of the items in the test may not measure any of the 
objectives and should be considered as irrelevant to the stated objec- 
tives. The test which best covers the objectives and has the smallest 
percentage of irrelevant items is the best test for the purpose. The 
procedure used in selecting an intelligence test, an interest inven- 
tory or a measure of typical behavior is somewhat different. In these 
cases the validity of the test must be determined through examina- 
tion of the empirical (statistical) evidence furnished in the test 
manual. In any case, it is advisable for the prospective user to 
actually take each test under consideration. This procedure is a 
valuable aid in becoming acquainted with a new test. 

If the search reveals several different tests which seem to be 


equally valid for the intended purpose, the choice may be made 
between them on the basis of reliability, cost, time for administra- 
tion, ease of scoring, format, 


or other secondary considerations 
which have a bearing on the selection. 


Free advice on test selection 


lege of Emporia, Emporia, Kansas. 


Bureau of Educational Research and Service, State University of 
Iowa, Iowa City, Iowa. 
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Bureau of Publications, Teachers College, Columbia University, 
New York 27, New York. 


California Test Bureau, 5916 Hollywood Boulevard, Los Angeles 
28, California. 

Center for Psychological Service, George Washington University, 
Washington, D. C. 


Cooperative Test Division, Educational Testing Service, 20 Nassau 
St, Princeton, New Jersey; 4640 Hollywood Boulevard, Los 
Angeles 27, California. 


Educational Test Bureau, Educational Publishers, Inc., 720 Wash- 
ington Avenue, S.E., Minneapolis 14, Minnesota. 


C. A. Gregory Co., 345 Calhoun St., Cincinnati 19, Ohio. 
Houghton Mifflin Co., 2 Park St., Boston 7, Massachusetts. 


Ohio Scholarship Tests, Ohio State Department of Education, 
Columbus, Ohio. 


Personnel Press, Inc., 180 Nassau St., Princeton, New Jersey. 

Psychological Corporation, 552 Fifth Avenue, New York 36, New 
York. 

Public School Publishing Co., 204 West Mulberry St., Bloomington, 
Illinois. 

Science Research Associates, Inc., 57 West Grand Ave., Chicago 10, 
Illinois. 

Sheridan Supply Co., P. O. Box 837, Beverly Hills, California. 

Stanford University Press, Stanford, California. 

C. H. Stoelting Co., 424 No. Homan Ave., Chicago 24, Illinois. 

World Book Co., 313 Park Hill Ave., Yonkers 5, New York. 


EXERCISES 


l. Send for the test catalogs of the following publishers. These are 
some of the most important publishers in terms of the number 
of tests which they publish. 

California Test Bureau 
Cooperative Test Division, Educational 
Testing Service 
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Educational Test Bureau 
Psychological Corporation 
Science Research Associates 
World Book Company 


i i tests 
Study the catalogs and note the variety of standardized 
and inventories that are available. 


Select two or three standardized achievement tests in your OW 
field from the Fourth Mental Measurements Yearbook or _ 
the publishers’ catalogs and send for a specimen set of each = 
Be sure to have your order countersigned by your college AA f 
tor or the test publisher probably will not send the test. RE 
a good idea to enclose the money with your order since ‘és 
amount involved in usually small, and it is an imposition 
make the company send a bill for such a small amount. 
When you receive your specimen copies of the tests, take the 
tests yourself under the st 


andardized conditions described in 
the manual. Carefully stud 


y both the tests and the manuals. 
If the tests have been reviewed in the Fourth 
ments Yearbook read the reviews and see wh 
agree with the reviewers’ opinions about the 


Mental Measure- 


tests. 


SUGGESTED ADDITIONAL READINGS 
Greene, H.A., Jorgensen 


» A.N., and Gerberich, J.R. Measurement 
and Evaluation in the Secondary School. 
Thorndike, R.L., and Hagen, E. Measure 


ment and Evaluation in 
Psychology and Education, 


APPENDIX A 


An Annotated Bibliography on 
Measurement and Evaluation 


Adams, G.S., and Torgerson, T.L. Measurement and Evaluation for 

the Secondary School Teacher. New York: Dryden Press, 1956. 

Covers use of teacher-made and standardized measuring devices. 

Includes implications for corrective procedures. Separate chap- 

ters covering measurement and evaluation in the major subjects 
taught in the secondary schools. 


Adkins, D.C. Construction and Analysis of Achievement Tests. 
Washington, D.C.: U.S. Government Printing Office, 1947. 
A sound treatment of objective test construction. Written for 
the United States Civil Service Commission. 


Anastasi, A. Psychological Testing. New York: The Macmillan 
Company, 1954. 
Standard college text in psychological testing. Includes basic 
principles of test and measurement theory and discussions of 
representative standardized tests. 


Arny, C.B. Evaluation in Home Economics. New York: Appleton- 
Century-Crofts, Inc., 1953. 
Methods of measuring and evaluating student progress with 


special emphasis on and examples from the field of home 
economics. 


ena. 
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Bean, K.L. Construction of Educational and Personnel Tests. New 
York: McGraw-Hill Book Co., 1953. fe ee 
Emphasizes the development of better tests. Includes : E 
gestions for preparing objective test items and essay questions. 


Bloom, B.S. A Taxonomy of Educational Objectives. New York: 
Longmans, Green & Co., 1956 ae z 
A scholarly classification of educational objectives, together 


5 ë i ives 
with illustrations of how the achievement of these objecti 
may be measured. 


Buros, O.K. (Editor). The Fourth Mental Measurements Yearbook. 
Highland Park, New Jersey: Gryphon Press, 1953. 
Lists and reviews the most commonly used standardized tests 
and inventories. The most important source of information 
about standardized tests and inventories. 
Clarke, H.H. Application o 


f Measurement to Health and Physical 
Education. New York: P 


rentice-Hall, Inc., 1950. d 
Covers methods of measuring and evaluating performance an 
knowledge in the field of health and physical education. 


Cronbach, L.J. Essentials of Psychological Testing. New York: 
Harper & Brothers, 1949, z 


College text on Psychological testin: 
basic measurement theo 


ability, and testing of typ; 
Ferguson, L.W. Personality Measurement. New York: McGraw-Hill 
Book Co., 1952. 


Discusses methods and representative tests and devices used in 
evaluating personality. 


Freeman, F.S. Theo 


8- Sound presentation o 
ry. Covers basic concepts, tests 0 
ical performance. 


Ty and Practice of Psychological Testing (Re- 
f Henry Holt and Co., 1955. E 
s theory and appli- 
lity and personality. 
e of intelligence tests. 


y statistics as applied to psycholog 
and education. 


Gerberich, J.R. Specimen Ob 


jective Test Items. New York: Long- 
mans, Green & Co., 1956. 


5S one slimy 
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Primarily a collection of over 227 objective test items represent- 
ing a wide variety of subjects and objectives. Includes specimen 
objective test items for evaluating skills, knowledges, concepts, 
understandings, applications, activities, appreciation, attitudes, 
interests, and adjustment. A rich source of ideas for the con- 
struction of objective test items. 


Goodenough, F.L. Mental Testing. New York: Rinehart & Co., 1949. 
Includes sections on historical orientation, principles and 
methods, tests and scales, and applications. Emphasizes the im- 
portance of research in measurement. 


Green, H.A., Jorgensen, A.N., and Gerberich, J.R. Measurement 
and Evaluation in the Elementary School (Second Edition) . 
New York: Longmans, Green & Co., 1953. 

College text on measurement and evaluation. Covers teacher- 
made and standardized measures. Special chapters devoted to 
the evaluation of achievement in different areas of the elemen- 
tary school curriculum. 


—— Measurement and Evaluation in the Secondary School (Sec- 
ond Edition). New York: Longmans, Green & Co., 1954. 
Secondary school version of the book listed abolve. Has much 
material that is also contained in the Elementary volume. In- 
cludes separate chapters on each major separate subject taught 
in the secondary schools. 


Henry, N.B. (Editor). Measurement of Understanding. Forty-filth 
Yearbook, Part I, National Society for the Study of Education. 
Chicago: The University of Chicago Press, 1946. 

Emphasizes the nature of understanding and techniques for 
evaluating understanding. Includes sample test items for meas- 
uring understanding in many different subject areas. 


Lindquist, E.F. A First Course in Statistics (Revised Edition). 
Boston: Houghton Mifflin Co., 1942. 
An introductory text in educational statistics. 


—— (Editor). Educational Measurement. Washington, D. C.: 
American Council on Education, 1951. 
The most comprehensive and authoritative book on achieve- 
ment test construction available. Chapters written by 20 
authorities in measurement and evaluation. Includes sections 
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B é OOD: 
on the functions of measurement in education, the constructi 
of achievement tests, and measurement theory. 


Magnuson, H.W., et al. Evaluating Pupil Progress. Bulletin of the 
California State Department of Education, XXI:6, 1952. m 
Emphasizes the informal measurement and evaluation tec 
niques that can be used by the classroom teacher. 


Micheels, W.J., and Karnes, M.R. Measuring Educational Achieve- 
ment. New York: McGraw-Hill Book Co., 1950. P 
Covers standardized and teacher-made measures of ability m 
personality. Of particular value in coverage of product an 


d 
procedures measurement. Good coverage of measurement an 
evaluation in industrial arts. 


Monroe, W.S. (Editor) . Encyclopedia of Educational Research. 

(Revised Edition). New York: The Macmillan Company, 1950. 

Contains authoritative articles on m 
ment and evaluation. 


Odell, C.W. How to Improve Classroom Testing. Dubuque, Iowa: 
William C. Brown Co., 1953. 
A relatively non-technical book on the construction and use of 
teacher-made tests. Covers objective and essay tests. 


Remmers, H.H., and Gage, N.L. Educational Measurement and 
Evaluation (Revised Editio; 


n). New York: Harper & Brothers, 
1955. 


any aspects of measure- 


A standard text on measurement and evaluation. Covers 
teacher-made and standardized instruments. Includes discus- 
sions on evaluation of environment 
physical aspects of pupils as well as 
usually included in such texts. 


Ross, C.C., and Stanley, 
(Third Edition). 

A standard text 

tions on the 

teacher-made 
instruction. 


Smith, G.M. A Sim 


and background and 
the more usual topics 


J.C. Measurement in Today’s Schools 
New York: Prentice-Hall, Inc., 1954. 

on educational measurement. Includes sec- 
Problem of measurement, the construction of 
tests, the testing program, and measurement 1n 


plified Guide to Statistics for Psychology and 
Education (Revised Edition). New York: Rinehart & Co., 1946. 


As the title implies, a simple treatment of basic statistics used 
in psychology and education. 
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Staff, Division on Child Development and Teacher Personnel, 
Helping Teachers Understand Children. Washington, D. C.: 
American Council on Education, 1945. 

Covers and illustrates the use of informal evaluation techniques 
used in studying children. 


Strang, R. Reporting to Parents. New York: Bureau of Publications, 
Teachers College, Columbia University, 1947. : 
Covers a variety of methods of reporting pupil progress to 
parents. 


Thomas, R.M. Judging Student Progress. New York: Longmans, 
Green & Co., 1954. 
Intended for the elementary classroom teacher. Covers both 
teacher-made and standardized measurement and evaluation 
techniques, but emphasizes the teacher’s own evaluation meth- 
ods. Provides many practical illustrations. 


Thorndike, R.L., and Hagen, E. Measurement and Evaluation in 
Psychology and Education. New York: Wiley & Sons, 1955. 
Covers theory and use of teacher-made and standardized meas- 
urement and evaluation techniques. 


Torgerson, T.L., and Adams, G.S. Measurement and Evalation. 
New York: Dryden Press, 1954. 

Emphasizes measurement and evaluation for the elementary 

school teacher with implications for corrective procedures. In- 

cludes sections on the evaluation of specified subject arcas. 

Divided into four parts: the evaluative process, the study of 

individuals, the improvement of instruction, and administra- 
tive and supervisory aspects. 


Travers, R.M.W. Educational Measurement. New York: The Mac- 
millan Company, 1955. 
Consists of four parts: background for educational RES 
ment, measuring the intellectual outcomes of education, meas- 
uring personality development, and predicting pupil progress. 
Includes much of the material contained in the author's other 
book listed below. 


——— How to Make Achievement Tests. New York: Odyssey Press, 
1950. 


A guide to the construction of teacher-made achievement tests. 
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Vernon, P.E. Personality Tests and Assessments. New York: Henry 
Holt & Co., 1954. 
A description and critical analysis of the tests and non-test 
methods employed in assessing personality. 


Wrinkle, W.L. Improving Marking and Reporting Practices in 
Elementary and Secondary Schools. New York: Rinehart & Co., 
1947. 


A discussion of marking and reporting practices with emphasis 
on their improvement. 


APPENDIX B 


More About Validity 
and Reliability 


Test validity 


A measuring device is valid if it measures what it is supposed to 
measure, Validity is sometimes referred to as the truthfulness of 
a measure. It is the most important characteristic of a test. If a 
test does not truly measure what it is supposed to measure, it is of 
no value, regardless of its other good features. Sometimes a test is 
referred to as a “valid test.” In fact a test cannot be said to be valid 
in a general sense. It can only be valid for a particular purpose or 
Purposes. A test which is a valid measure of achievement in social 
Studies in the fifth grade in one community may not be a valid test 
of social studies in the fifth grade in an adjoining community. 

There are two main types of validity. The first type is logical 
or rational validity. This type of validity is established by inspect- 
ing the test itself to determine the extent to which the items in the 
test correspond to the objectives in the course or unit that is being 
evaluated. The second type is statistical or empirical. This type of 
validity is employed when a measuring device is being used to 
predict some kind of behavior. The measure of validity used here 
is generally a correlation coefficient between the test scores and the 
scores obtained from the behavior which we are interested in pre- 
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dicting. This correlation coefficient is called a validity aoe 
Validity coefficients theoretically can take on the whole Tange 

values from 1.00 (perfect positive correlation) through .00 = 
correlation) to —1.00 (perfect negative correlation). Validity coef- 
ficients of more than .50 are rare when a test is correlated with 
some criterion other than another test. This correlation still leaves 
a great deal to be desired when a test is used to predict what a 


. . S 
person may do at some future time. For this reason test score 
should always be interpreted as clues wh 


behavior rather than absolute predicto 
Most tests used by classroom teachers 


of this type rely primarily on logical ra 
of determining 


ich may help predict future 
rs of behavior. 

are achievement tests. Tests 
ther than empirical methods 
the validity of the test for a particular purpose. it 
evaluating the validity of an achievement test, the proper procedure 
is to make an outline of the objectives and content of the course or 


unit which is being tested, and then compare these objectives and 
content with the test in question. 


Reliability 


The precision or consistency of a test instrument is referred to “A 
its reliability. The precision of a test can be described by 
Standard error of measurement. The consistency of a test can be 
described by the reliability coefficient. icl 

The standard error of measurement reflects the extent to winch 
Tepeated measurements of the same thing tend to cluster SS 
I£ the means of measurement are precise, the repeated measuremen E 
will tend to be similar to each other, and the standard error S 
measurement will be small. If the means of measurement is na 
precise, the repeated measurements would not be as similar, and the 
standard error of measurement will be large. : A A 

If a boy actually weighs 95 pounds, and if he is weighed op a i 
accurate scale 100 times in rapid succession, each of the weighing 
will be very close to his true weight. Sometimes the scale may at ae 
96 pounds and sometimes 94 pounds, but seldom, if ever, wi 
deviate from the true weight by more than one pound. If oh vad 
boy has a true I.Q. of 95 and if he is tested 100 times a rapi na 
cession,’ there would be considerably greater diversity of I.Q. score 


i he st rd 
‘This is a hypothetical example to illustrate the meaning of the standard 


ae a4 etest the same bo 
error of measurement. In practice, it is not possible to re y 
100 times. 
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than of weights. The exact amount of scatter of the obtained scores 
would be expressed in terms of the standard error of measurement. 

Statisticians consider any measurement made to be only an esti- 
mate of the true measurement because they realize that some error 
is involved in all measurement procedures. Because of a relation- 
ship between the standard error of measurement and the “normal 
curve” it is possible to make probability statements about the true 
score of an individual based on the obtained score on a test and the 
size of the standard error of measurement. It is known that if the 
standard error of measurement is added to and substracted from an 
obtained measure, the chances are two out of three that the true 
score will be contained within the interval so formed. Thus, if it is 
known that the standard error of measurement on an intelligence 
test is 6 points? and the score obtained by testing a pupil is 95, it is 
Possible to form the interval 89-101 (by subtracting 6 from 95 and 
by adding 6 to 95) and to state that the chances are two out of three 
that the “true 1.Q.” of the student is somewhere between 89 and 
101. There remains, of course, still one chance out of three that the 
true 1.Q. is somewhere outside of these limits, either greater than 
101 or less than 89. It is also known that if twice the standard 
error of measurement is added to and subtracted from an obtained 
Score, a range of scores is obtained which will contain the true score 
nineteen out of twenty times. Thus, in the example above, the 
chances are nineteen out of twenty that the true LQ. of the student 
Was between 83 and 107. : 

The second method of reporting the reliability of a test is the 
reliability coefficient. This coefficient is a correlation coefficient 
computed between scores obtained on two testings of the same 
group. 

` This coefficient reflects the degree to which the scores for the 
group tested tended to agree on the two occasions. If the scores 
on the two testings were in perfect agreement (an improbable situa- 
tion) , the reliability coefficient would be equal to 1.00. A reliability 
Coefficient of .00 would indicate that there was no consistency of 
measurement, that there was no tendency for persons’ scores on the 
two testings to be close to each other. 


*Information about the size of a standard error of measurement is usually 
furnished by the author of the test and will be found in the manual that accom- 
panies the test, 

* There are several different methods of computing the reliability of a test, but 
a discussion of these methods is beyond the scope of this book. See any of the 
Standard texts for further discussion of this topic. 
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Reliability coefficients are usually much higher than validity S 
efficients. A reliability coefficient of .70 is low. A validity cond : 
of .70 is very high. The question of how high a correlation coel fi 
cient must be is difficult to answer. A test of relatively low reliablity 
(60 to .70) can be useful if it is used to make comparisons beien 
groups, since errors in the scores of individuals tend to cancel eac 
other out. Thus, if a teacher wanted to compare the reading ability 
of two different classes, a short test of low reliability could be used. 
If, however, a decision had to be made about the grade placen ie 
of an individual pupil in the class, a test of higher reliability woul 


be needed. Tests used to make decisions about individual pupils 
should have high reliability. 


Relationship between validity and reliability 


Test-makers who have no real evidence al 
sts sometimes emphasize the high reliabil 
fore important to understand the relationship which exists between 
the validity and reliability of a measuring device. This relationship 
is expressed as follows: a test may be reliable without being valid, 
but a test cannot be valid without being reliable. Therefore, 
reliability is a necessary prerequisite to validity, but does not in itself 
guarantee validity. In fact, it is possible for a measure to be 
extremely reliable and yet possess no validity whatsoever for a 
particular purpose. For example, the circumference of the head 
can be measured with great precision; however, this measure has 
been shown to have no significant correlation with intelligence, 
and thus ‘is not a valid measure for the purpose of estimating f 
intelligence. This does not mean that the circumference of the head 
is not a valid measure in general. After all, it is a perfectly valid 
measure for determining the size hat a person should buy. 


bout the validity of their 
ity of the test. It is there- 
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Cronbach, L.J. Essentials of Psychological Testing. 
Goodenough, F.L. Mental Testing. 
Lindquist, E.F. (Editor) . Educational Measurement. 


Thorndike, R.L., and Hagen, E. Measurement and Evaluation in 
Psychology and Education. 
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Ability 
evaluating with standardized tests, 
84 fF. 
representative standardized tests 
of, 90 
Achievement 
evaluating with products and 
performances, 44 ff. 
evaluating with standardized 
tests, 74 ff. 
evaluating with teacher-devised 
essay tests, 38 ff. 
evaluating with teacher-devised 
ie short-answer tests, 14 ff. 
chievement, pupi i 
oe pupil, reporting of, 
Achievement tests 
of basic skills, 75 
of broad educational objectives, 
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of content areas, 75 
diagnostic, 76 f. 
relationship to tests of intelli- 
gence and aptitude, 84 f. 
Series of, 76 
Standardized, 74 ff. 
table of representative, 78 f. 
Adams, G. S., 57, 83, 103, 107 
Adjustment inventories, 94 fE. 
Adkins, D. C., 37, 103 
Aims, operational, definition of, 9 
Anastasi, A., 93, 103 
Anecdotal records, 1, 52 f, 
Anecdotes, examples of, 53 


Answer column, 34 
Answer sheets, 34 
Aptitude 

scholastic, 84 

special, 89 f. 

test batteries, 89 f. 

tests, relationship to achievement 

and intelligence, 84 f. 

Arithmetic 

blueprint for test, 16 

7th grade course in, 6 
Arny, C. B., 103 


Bean, K. L., 104 

Bell Adjustment Inventory, 96 
Bloom, B. S., 13, 104 

Buros, O. K., 99, 104 


California Achievement Tests, 1957 
edition, 78 
California Test of Mental Maturity, 
1957 edition, 90 
Catalogs of tests, 100 
Check lists, 46 ff. 
for reporting to parents, 59 f. 
suggestions for devising, 48 f. 
Chicago Tests of Primary Mental 
Abilities, 90 
Clarke, H. H., 104 
Completion items and direct ques- 
tions, 32 f. 
Consistency, 10 
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Content areas, tests in, 75 


Coordinated Scales of Attainment, 
78 


Course development, framework 
for, 5 


Cronbach, L. J., 93, 104, 112 
Cumulative records, 61 


Diagnostic tests, 76 f. 

Differential Aptitude Test Battery, 
90 

Direct questions and completion 
items, 32 f, 

Distractor, 18 


Educational objectives, tests of, 
Tii 
Essay items, 32 
to measure complex understand- 
ings, 40 
Essay tests 
advantages of, 39 
construction of, 40 
disadvantages of, 36 


evaluating achievement with, 
38 f. 
purposes of, 38 f, 
scoring of, 41 f. 
Essential High School Content 
Battery, 78 
Evaluating 


abilities with standardized tests, 
84 ff. 

achievement through products 
and performances, 44 ff. 

achievement with standardized 
tests, 74 ff. 

achievement with teacher-devised 
essay tests, 38 ff. 
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achievement with teacher-devised 
short-answer tests, 14 ff. 

adjustment, 94 ff. 

interest, 94 ff. 

typical behavior through 
observation, 51 f. 

typical behavior with teacher- 
devised instruments, 50 ff. 

Evaluation 

comprehensive and continuous, 
11 

essential characteristics of meas- 
urement procedures used in, 
9 4; 

key steps in process of, 7 f. 

meaning of, 1 , 

relationship between activities, 
objectives, and, 5 3 

Evaluation and Advisory Service, 

Educational Testing Service, 
100 


Ferguson, L. W., 97, 104 
File of tests and test items, 36 
Freeman, F. S., 93, 104 


Gage, N. L., 13, 37, 43, 49, 106 
Garrett, H. E., 104 
Geometry design, rating scale for, 
48 
Gerberich, J. R., 37, 83, 102, 104, 
105 
Goodenough, F. L., 93, 105, 112 
Grades 
basis for assigning, 61 ff. 
collecting evidence for, 67 
Grading, see Marking 
Green, H. A., 83, 102, 105 
Grouping, homogencous, 88 
Guessing on objective tests, 35 
Guess-who questionnaire, sample of, 
56 
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Guess-who technique, 55 
Guidance, measurement and 
evaluation in, 2 


Hagen, E., 37, 73, 83, 93, 97, 102, 
107, 112 

Halo error, 49 

Henry, N. B., 20, 37, 105 

Homogeneous grouping, 88 


1Q. formula, 87 
Informal observations, 51 
Informal reports of pupils, 54 
Instruction, measurement and 
evaluation in, 3 f. 
Intelligence tests, 85 ff. 
group, 86 
individual, 86 
performance, 86 
relationship to tests of achieve- 
ment and aptitude, 84 f. 
types of, 86 f. 
using scores of, 88 
what they measure, 85 f. 
Interest 
evaluating, 94 f. 
inventories, 95 f. 
Inventory, 94 
adjustment, 96 £. 
x interest, 95 ff. 
Iowa Every-Pupil Tests of Basic 
Skills, 78 


Iowa Tests of Educational Develop- 


ment, 75, 78 
Jorgensen, A. N., 83, 102, 105 


Karnes, M. R., 49, 106 

Kuder Preference Record 
(Vocational) , 95 

Kuhlman-Anderson Intelligence 
Test—Sixth Edition, 90 
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Letters to parents, 59 
Limited sampling, 40 
Lindquist, E. F., 13, 37, 43, 105, 112 


Magnuson, H. W. 57, 106 
Marking 
grading on the curve, 71 
improving marking and grading 
practices, 67 f. 
need for, 58 
nine-point system of marks, 69 f. 
quantitative aspects of, 68 ff. 
summarizing grades, 72 f. 
weighting test scores, 71 f. 
Matching items, 29 
examples of, 30 ff. 
suggestions for writing, 30 f. 
Mean, 72 
Measurement, meaning of, 1 
Measurement and evaluation 
in guidance, 2 
in instruction, 3 f. 
Measuring instruments, usefulness 
of, 10 
Median, 72 
Mental Measurements Yearbooks, 
99, 104 
Metropolitan Achievement Tests, 78 
Micheels, W. J., 49, 106 
Minnesota Multiphasic Personality 
Inventory, 96 
Monroe, W. S., 106 
Multiple-choice items, 17 ff. 
examples of, 18 ff. 
for young children, 18 
measuring understanding with, 19 
suggestions for writing, 29 
variations of, 19 
Norms, 74, 77 
grade, 77 
percentile, 77 
national, 80 
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Objective test 
administering, assembling, and 
scoring, 34 
analyzing results of, 35 f. 
see aslo Short-answer tests 
Objectives 
feasibility and practicability of, 5 
relationship between activities, 
evaluation, and, 5 f. 
tests of broad educational, 75 £. 
Observation 
evaluating typical behavior 
through, 51 f. 
informal, 52 f, 
planned, 51 f. 
Odell, C. W., 37, 106 
Oral reports, evaluating, 44 
Otis Quick-Scoring Mental Ability 
Tests: New Edition, 90 


Parent-teacher conferences, 59 
Performances and products, 
evaluating, 44 ff. 

Pintner General Ability Tests— 
Non-Language Series, 90 
Pintner General Ability Tests— 

Verbal Series, 90 
Plan sheet, 6 
Planned observations, 51 
Products and performances, 
evaluating, 44 ff. 
Profiles, test, 80 ff. 
examples of, 81 


Progress report iad 


example of, 62 ff. : a 
Publishers of tests, 100 i 
Pupil reports, informal for 3 

obtaining information,*54 


Questionnaire, 94 
guess-who, 55 
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Rating scale, 46 f., 51 ; 7 
suggestions for devising, 48 f- 

Reading readiness, tests of, 89 
Records, cumulative, 61 
Reliability, 10, 110 ff. 

coefficient, 110 ff. 

of scoring essay tests, 42 

relationship to validity, 112 5 
Remmers, H. H., 13, 37, 43, 49, 10 
Report cards, 60 f. 6a 
Reporting pupil progress, 4, 58 tt. 
Reporting to parents 

methods of, 58 ff. 
Ross, C. C., 37, 43, 106 


SRA Achievement Series, 79 
SRA Youth Inventory, 96 ‘de 
Sampling, inadequate in essay tes! 
39 
Scholastic aptitude, 84 Fi 
School and college ability tests, 92> 
82, 91 
Scoring 
essay tests, 41 f. 
objective tests, 34 
Scoring key, 35 
Selection type items and supply 
type items, 17 j 
Sequential tests of educational 
progress, 19, 76, 79 
Short-answer test 
assembling, administering, and 
scoring, 34 s 
evaluating achievement with 
- teacher-made, 14 ff. 
* planning the, 14 f. , 
Short-answer items, suggestions for 
writing, 33 f. 


: _ Skills, basié, tests of, 75 


Smith, G. M., 106 
Sociogretry, 54 


” Socingrain, 54 £ 
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Sorting method of scoring essay 
tests, 41 
Standardized tests, 74 
of ability, 84 ff. 
of achievement, 74 ff. 
advice on selection of, 100 
diagnostic, 76 f. 
how to select, 98 ff. 
specimen copies of, 99 
Standard error of measurement, 
82, 110 f. 
Stanford Achievement Tests, 79 
Stanford-Binet Scale, revised, 1937 
edition, 91 
Stanley, J. C., 37, 43, 106 
Stem, 18 
Strang, R., 73, 107 
Strong Vocational Interest Blank, 
95 
Supply type item, 32 ff. 
advantages of, 33 
and selection type items, 17 
suggestions for writing, 33 


Terman-McNewmar Test of Mental 
Ability, 91 

Test, objective, see Short-answer 
tests 

Test construction, blueprint for, 16 

Test norms, 77 f. 

Test profiles, 80 f. 

Test selection, 100 


C 
& 

P 
8 


Te W 
ae 
“ Library l9 . 


To Calcutta 


117 


Tests, catalogs of, 100 
Tests, standardized, see 
Standardized tests 
Tests and measurements, 1 
Tests and test items, file of, 36 
Thomas, R. M., 13, 49, 57, 73, 107 
Thorndike, R. L., 37, 73, 83, 93, 97, 
102, 107, 112 
Time sampling, 52 
Torgerson, T. L., 57, 83, 103, 107 
Travers, R. M. W., 13, 37, 107 
True-false items, 31 f. 
Typical behavior 
evaluating, 50 ff. 
meaning of, 50 f. 


Validity, 10, 109 ff. 

coefficient, 110 
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Wechsler Adult Intelligence Scale, 
91 
Wechsler Intelligence Scale for 
Children, 91 
Weighting 
items in rating scale, 47 
test scores, 71 
Wesman, A. G., 84 
“What I Like to Do,” 95 
Work habits, development of, 8 
Wrinkle, W. L., 73, 108 


